
Senior Site Reliability Engineer
Are you a highly skilled and experienced Senior Site Reliability Engineer looking for a new and exciting opportunity? Look no further! EPAM Systems is seeking a talented individual to join our team and help us drive the reliability and performance of our systems to new heights. As a Senior SRE, you will play a critical role in ensuring the availability, scalability, and efficiency of our software products. If you thrive in a fast-paced environment, have a passion for solving complex technical challenges, and possess strong leadership skills, we want to hear from you!
- Design and implement robust and scalable systems to ensure the reliability and performance of our software products.
- Lead and mentor a team of SREs to drive continuous improvement and innovation in system reliability and performance.
- Collaborate with cross-functional teams to identify and resolve technical issues and implement best practices.
- Develop and maintain monitoring tools and processes to proactively identify and address potential issues.
- Conduct regular performance and capacity assessments to identify areas for optimization and improvement.
- Develop and maintain disaster recovery plans and procedures to ensure business continuity.
- Act as a subject matter expert for all aspects of site reliability and performance, providing guidance and support to other teams.
- Stay up-to-date with industry trends and developments, and proactively identify opportunities for improvement and optimization.
- Communicate regularly with stakeholders and provide updates on system reliability and performance metrics.
- Participate in on-call rotations and respond to critical incidents, ensuring timely resolution and effective communication with relevant teams.
Extensive Experience In Designing, Building, And Maintaining Highly Available And Scalable Systems In A Cloud Environment, Such As Aws Or Azure.
Proficiency In At Least One Programming Language, Such As Python, Java, Or Go, And Experience With Automation Tools Like Ansible Or Terraform.
Strong Knowledge Of Monitoring And Logging Tools, Such As Datadog, Splunk, Or Elk Stack, And Experience With Implementing And Maintaining A Comprehensive Monitoring Strategy.
Proven Experience In Troubleshooting Complex Technical Issues And Implementing Solutions To Prevent Future Incidents.
Excellent Communication And Collaboration Skills, With The Ability To Work With Cross-Functional Teams And Drive Projects To Completion.
Troubleshooting
DevOps
Scripting
Incident Management
Automation
Cloud Computing
Performance optimization
Disaster recovery
Configuration management
Infrastructure design
Monitoring
Security Management
Communication
Conflict Resolution
Emotional Intelligence
Leadership
Time management
creativity
Teamwork
Adaptability
Problem-Solving
Decision-making
According to JobzMall, the average salary range for a Senior Site Reliability Engineer in Atlanta, GA, USA is $130,000 - $160,000 per year. This range can vary depending on the individual's experience, skills, and the company they work for. Additionally, bonuses and other benefits may also impact the overall salary package.
Apply with Video Cover Letter Add a warm greeting to your application and stand out!
EPAM Systems, Inc. is a US company that specializes in product development, digital platform engineering, and digital and product design agency.

Get interviewed today!
JobzMall is the world‘ s largest video talent marketplace.It‘s ultrafast, fun, and human.
Get Started
