
Senior Site Reliability Engineer
Are you a highly skilled and experienced Site Reliability Engineer looking for a new challenge? Do you have a passion for designing and implementing robust and scalable systems? EPAM Systems is seeking a Senior Site Reliability Engineer to join our dynamic team. In this role, you will play a critical role in ensuring the reliability and performance of our systems, working closely with cross-functional teams to identify and resolve any issues. If you have a strong background in DevOps and are looking for an opportunity to make a significant impact, we want to hear from you.
- Design and implement scalable and reliable systems to ensure optimal performance and availability.
- Collaborate with cross-functional teams to identify and resolve any issues related to system reliability.
- Develop and maintain monitoring and alerting systems to proactively identify and address potential problems.
- Create and maintain documentation for system configurations, processes, and procedures.
- Participate in the planning and execution of system upgrades and deployments.
- Utilize DevOps principles and practices to continuously improve system reliability and efficiency.
- Troubleshoot and resolve complex technical issues related to system performance and availability.
- Implement disaster recovery and business continuity plans to ensure system resilience.
- Keep up-to-date with industry best practices and technologies related to system reliability and performance.
- Mentor and guide junior team members to enhance their skills and knowledge.
- Communicate effectively with stakeholders to provide updates on system performance and reliability.
- Proactively identify potential risks and provide recommendations for mitigation.
- Participate in on-call rotations to provide 24/7 support for critical systems.
- Contribute to the development and maintenance of automation tools to streamline processes and improve efficiency.
- Collaborate with other teams to design and implement new features and enhancements to improve overall system reliability and performance.
Extensive Experience In Cloud Computing: A Senior Site Reliability Engineer At Epam Systems Should Have A Deep Understanding Of Cloud Computing Technologies, Such As Aws, Azure, And Google Cloud. They Should Have Experience In Designing, Implementing, And Managing Highly Available And Scalable Cloud-Based Solutions.
Proficiency In Devops Tools And Practices: Epam Systems Values A Strong Devops Culture, And A Senior Site Reliability Engineer Should Possess Expertise In Various Devops Tools And Practices. This Includes Configuration Management Tools Like Chef And Ansible, Containerization Tools Like Docker And Kubernetes, And Ci/Cd Tools Like Jenkins And Gitlab.
Strong Knowledge Of Linux And Scripting Languages: A Senior Site Reliability Engineer Should Have A Solid Understanding Of Linux Operating Systems, Including Advanced Troubleshooting Skills. They Should Also Be Proficient In At Least One Scripting Language, Such As Bash, Python, Or Ruby, To Automate Routine Tasks And Processes.
Experience In Monitoring And Alerting: As Part Of Their Role In Maintaining System Availability, A Senior Site Reliability Engineer Should Have Experience In Implementing And Managing Monitoring And Alerting Systems. This Includes Tools Like Prometheus, Grafana, And New Relic, As Well As Knowledge Of Industry-Standard Monitoring Practices.
Leadership And Collaboration Skills: A Senior Site Reliability Engineer At Epam Systems Is Expected To Lead And Mentor Other Team Members And Collaborate With Cross-Functional Teams. They Should Possess Strong Communication, Problem-Solving, And Decision-Making Skills To Effectively Manage Complex Projects And Drive Continuous Improvement Within The Organization.
Troubleshooting
DevOps
Scripting
Database
Automation
Cloud Infrastructure
Disaster recovery
Performance tuning
Configuration management
System Monitoring
Security Management
Infrastructure As Code
Communication
Conflict Resolution
Emotional Intelligence
Leadership
Time management
creativity
Critical thinking
Teamwork
Adaptability
Problem-Solving
According to JobzMall, the average salary range for a Senior Site Reliability Engineer in Philadelphia, PA, USA is between $129,000 - $180,000 per year. This may vary depending on factors such as experience, specific company, and industry.
Apply with Video Cover Letter Add a warm greeting to your application and stand out!
EPAM Systems, Inc. is a US company that specializes in product development, digital platform engineering, and digital and product design agency.

Get interviewed today!
JobzMall is the world‘ s largest video talent marketplace.It‘s ultrafast, fun, and human.
Get Started