OFFICE: DXC Technology, Hyderabad, Telangana
As a SRE/ DevOps Engineer you will be part of a team that combines software and systems engineering to build and run large scale, massively distributed, fault-tolerant systems deployed to cloud. You’ll have oversight of how systems relate to each other; limit time spent on operational tasks; automate wherever possible; carryout blameless post-mortems and proactively identify potential outages, continually iterating to make improvements.
You will be working as part of a team of first-class engineers supporting Azure cloud infrastructure accounts. The role will require you to work closely with the customer team, to create and maintain cloud environments, whilst acting as an escalation point for wider support teams. You will champion the use of Infrastructure As Code to ensure deployments are delivered correctly, as well as being managed, maintained and cost optimised. You will imbue a culture of adopting engineering best practices in a contemporary Cloud platforms.
You’ll be responsible for improving the whole lifecycle from inception and design through to deployment, operation and refinement and supporting services before they go live via system design consulting, developing software platforms and frameworks, capacity planning and launch reviews. Responsible for monitoring and measuring availability, latency & overall system health & providing scalability. Provide input into technical discussions and decisions & help to mentor junior members.
- Must have 3 years of experience in SRE role in a serverless environment in Azure
- Experience with distributed systems design, maintenance, and troubleshooting
- Experience with IaC (Infrastructure as Code), DevOps, CI/CD and modern tooling such as (Terraform, Concourse, Jenkins)
- Run infrastructure with Chef, Terraform and Kubernetes.
- Make monitoring and alerting alert on symptoms and not on outages.
- Document every action so your findings turn into repeatable actions–and then into automation.
- Monitoring and Tracing experience – you’ll be using (Prometheus and Honeycomb)
- At least one certification in Azure at associate level
- Ability to script (or code ideally) with (Python, Go, Perl, Ruby, C, C++ or Java)
- Good knowledge of algorithms, data structures, complexity analysis and software design
- Experience with DevOps environments / Containerisation (Docker, Kubernetes )
- Excellent communication skills; collaborative and personable – happy to help take a lead on projects and provide mentoring
- Have a strong understanding of network protocols and identity access management.
- Manage and maintain cloud environments to ensure they are secure and conform to best practice
- Provide reporting around availability of key systems and Provide 3rd line troubleshooting of cloud issues
- Devise and implement process improvements to resolve outstanding faults or provide new functionality
- Deal with all vulnerability and audit findings and ensuring requirements are redressed and completed within set deadlines Reliable and consistent decision maker
- Collaborate and communicate asynchronously. Document all the things so you don’t need to learn the same thing twice.
- Have an enthusiastic, go-for-it attitude. When you see something broken, you can’t help but fix it.
- May be an infrequent business need to provide after hours or weekend support for critical incidents, or planned upgrade work.