ByteDance

Site Reliability Engineer, Edge Services

ByteDance

Boston, MA, USA
Full-TimeDepends on ExperienceSenior LevelMasters
Job Description

Are you a highly skilled and motivated individual with a passion for ensuring the availability and performance of large-scale systems? Do you thrive in a fast-paced and dynamic environment, constantly seeking new challenges and opportunities to innovate? If so, ByteDance is looking for a Site Reliability Engineer for our Edge Services team to help us build and maintain a highly available and scalable infrastructure for our global user base. As a key member of our team, you will play a critical role in optimizing the reliability, performance, and efficiency of our services, ensuring an exceptional user experience for millions of users around the world. Join us and be a part of our mission to inspire creativity and bring joy to our users through our innovative technology.

  1. Develop and maintain a highly available and scalable infrastructure for large-scale systems.
  2. Monitor and troubleshoot system performance and availability issues to ensure a seamless user experience.
  3. Collaborate with cross-functional teams to design and implement solutions for improving system reliability and performance.
  4. Automate deployment processes and develop tools for efficient system management.
  5. Conduct regular system audits and implement best practices for system security and data protection.
  6. Continuously evaluate and improve system processes to optimize efficiency and cost-effectiveness.
  7. Stay up-to-date with industry trends and developments in site reliability engineering to propose and implement innovative solutions.
  8. Participate in on-call rotation to provide 24/7 support for critical system issues.
  9. Identify and mitigate potential risks to system stability and performance.
  10. Document system configurations, processes, and procedures for knowledge sharing and training purposes.
  11. Collaborate with other teams to ensure seamless integration of new services and features into the existing infrastructure.
  12. Mentor and guide junior team members to develop their skills and knowledge in site reliability engineering.
  13. Communicate effectively with team members and stakeholders to provide updates on system status and any potential issues.
  14. Adhere to company policies and procedures, as well as industry standards and regulations.
  15. Contribute to a positive and collaborative work environment, promoting teamwork and a culture of innovation.
Where is this job?
This job is located at Boston, MA, USA
Job Qualifications
  • In-Depth Knowledge Of Cloud Computing Technologies, Such As Aws, Azure, And Google Cloud Platform, And Experience In Managing And Optimizing Large-Scale Distributed Systems.

  • Strong Proficiency In Scripting And Programming Languages, Including Python, Java, Or Go, And Experience With Automation And Configuration Management Tools Like Terraform, Puppet, Or Chef.

  • Extensive Experience In Troubleshooting And Resolving Complex System Issues, Including Network, Server, And Application Performance, And A Deep Understanding Of Monitoring And Logging Tools Like Prometheus, Elk, Or Datadog.

  • Proven Track Record Of Working In A Fast-Paced, High-Availability Environment, With A Focus On Reliability And Scalability, And Experience In Designing And Implementing Disaster Recovery And Business Continuity Plans.

  • Excellent Communication And Collaboration Skills, With The Ability To Work Closely With Cross-Functional Teams, Including Developers, Infrastructure Engineers, And Product Managers, To Identify And Address Technical Challenges And Drive Continuous Improvement.

Required Skills
  • Security

  • Networking

  • Troubleshooting

  • DevOps

  • Scripting

  • Automation

  • Cloud Computing

  • Linux Administration

  • Load Balancing

  • Monitoring

  • Scalability

  • Cdn Management

Soft Skills
  • Communication

  • Conflict Resolution

  • Customer Service

  • Emotional Intelligence

  • Leadership

  • Time management

  • creativity

  • Teamwork

  • Adaptability

  • Problem-Solving

Compensation

According to JobzMall, the average salary range for a Site Reliability Engineer, Edge Services in Boston, MA, USA is between $100,000 and $160,000 per year. This range can vary depending on factors such as the company, experience level, and specific job responsibilities.

Additional Information
ByteDance is an Equal Opportunity Employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. We do not discriminate based upon race, religion, color, national origin, sex, sexual orientation, gender identity, age, status as a protected veteran, status as an individual with a disability, or other applicable legally protected characteristics.
Required LanguagesEnglish
Job PostedMarch 4th, 2025
Apply BeforeJune 9th, 2026
This job posting is from a verified source. 
Reposted

Apply with Video Cover Letter Add a warm greeting to your application and stand out!

About ByteDance

ByteDance is a technology company operating a range of content platforms that inform, educate, entertain and inspire people across languages, cultures, and geographies. Dedicated to building global platforms of creation and interaction, ByteDance now has a portfolio of applications available in over 150 markets and 75 languages. For example, TikTok, Helo, Vigo Video, Douyin, and Huoshan.

Frequently asked questions

Get interviewed today!

JobzMall is the world‘ s largest video talent marketplace.It‘s ultrafast, fun, and human.

Get Started