
Senior Site Reliability Engineer
Are you a seasoned engineer with a passion for creating reliable and scalable systems? Do you thrive in fast-paced environments and enjoy solving complex technical challenges? Look no further! EPAM Systems is seeking a highly skilled Senior Site Reliability Engineer to join our talented team. As a Senior SRE, you will play a critical role in ensuring the stability, performance, and availability of our clients' applications. If you have a strong background in software development, a deep understanding of cloud infrastructure, and a continuous improvement mindset, we want to hear from you! Join us in driving innovation and excellence in the world of enterprise software.
- Design and implement reliable and scalable systems for EPAM Systems' clients.
- Collaborate with cross-functional teams to ensure the stability, performance, and availability of applications.
- Monitor and troubleshoot complex technical issues to identify and implement solutions.
- Continuously improve and optimize systems to enhance performance and reduce downtime.
- Stay up-to-date with industry trends and best practices in software development and cloud infrastructure.
- Develop and maintain documentation for system configurations, processes, and procedures.
- Communicate effectively with team members and clients to provide updates and recommendations on system improvements.
- Participate in on-call rotations to provide 24/7 support for critical systems.
- Mentor and coach junior engineers to develop their skills and knowledge.
- Collaborate with the product development team to integrate new features and functionalities into systems.
- Identify potential risks and proactively implement measures to mitigate them.
- Conduct regular performance and load testing to ensure system reliability and scalability.
- Evaluate and implement new tools and technologies to improve system performance and efficiency.
- Adhere to established security protocols and standards to ensure the protection of sensitive data.
- Maintain a high level of professionalism and uphold EPAM Systems' values and standards.
Extensive Experience In Managing Large-Scale Production Systems: A Senior Site Reliability Engineer At Epam Systems Should Have A Track Record Of Managing Complex And High-Traffic Production Systems. This Includes A Deep Understanding Of Distributed Systems, Networking, And Infrastructure Architecture.
Strong Programming And Automation Skills: The Ideal Candidate Should Have A Strong Background In Programming, With Experience In At Least One Programming Language Like Python, Java, Or Go. They Should Also Have Experience With Automation Tools Like Ansible, Puppet, Or Chef To Automate Deployment And Management Processes.
Proficiency In Infrastructure And Cloud Technologies: A Senior Site Reliability Engineer Should Have A Thorough Understanding Of Various Infrastructure Technologies, Including Virtualization, Containerization, And Cloud Platforms Like Aws, Gcp, Or Azure. They Should Also Be Familiar With Monitoring And Logging Tools Like Prometheus, Grafana, And Elk Stack.
Excellent Troubleshooting And Problem-Solving Skills: As A Senior Role, A Site Reliability Engineer Should Be Able To Troubleshoot And Resolve Complex Issues In A Timely Manner. They Should Have A Strong Analytical Mindset And Be Able To Think Critically To Identify The Root Cause Of Problems And Implement Effective Solutions.
Communication And Teamwork: A Senior Site Reliability Engineer Should Have Excellent Communication Skills To Collaborate With Cross-Functional Teams And Stakeholders, Including Developers, Qa Engineers, And Project Managers. They Should Also Have Experience In Mentoring And Leading Junior Team Members To Drive Continuous Improvement And Innovation Within The Team.
Troubleshooting
DevOps
Scripting
Automation
Cloud Computing
Deployment
Disaster recovery
Performance tuning
Configuration management
Incident response
Monitoring
Infrastructure management
Communication
Conflict Resolution
Emotional Intelligence
Leadership
Time management
creativity
Attention to detail
Teamwork
Adaptability
Problem-Solving
According to JobzMall, the average salary range for a Senior Site Reliability Engineer in Houston, TX, USA is $130,000 - $160,000 per year. This can vary depending on factors such as experience, skills, and the specific company or industry. Some companies may offer higher salaries or additional benefits to attract top talent.
Apply with Video Cover Letter Add a warm greeting to your application and stand out!
EPAM Systems, Inc. is a US company that specializes in product development, digital platform engineering, and digital and product design agency.

Get interviewed today!
JobzMall is the world‘ s largest video talent marketplace.It‘s ultrafast, fun, and human.
Get Started
