A recent study by the publisher Dynatrace of 450 site reliability engineers or SREs (Site Reliability Engineers) shows that the role of the latter is now better understood and increasingly strategic.
AdvertisingAppearing among the big players in the public cloud, the profession of SRE (site reliability engineer) has become popular and is now present in most large organizations using cloud technologies. These engineers are responsible for ensuring the reliability and security of increasingly code-based infrastructures, in particular through the definition of service level objectives and automation. However, their role has not always been well understood. This situation is changing, as evidenced by a recent study by the publisher Dynatrace among 450 SREs worldwide. In this survey, 88% of respondents believe that the strategic importance of their role is better understood today than three years ago, even if only 20% consider that their organization is mature on the subject. In addition, 76% obtain bonuses or rewards when the key reliability indicators are achieved.
Among the tasks that take up most of their time, the SREs surveyed cite first and foremost the reduction of average repair times (MTTR), mentioned by 67% of them. This is followed by the development and maintenance of automation code (60%), the detection and rapid elimination of security vulnerabilities (58%), and the design of tests and experiments to reduce the risk of failure in production ( 52%). The security dimension is gaining in importance in the role of SREs: 68% of respondents expect their role in this area to become increasingly central, in particular due to the growing use of third-party software libraries in the development of cloud apps.
Automation and AI to stretch SRE practices
However, the SREs show certain recurring difficulties. Thus, almost everyone encounters obstacles when it comes to defining service level objectives (SLOs), even though they are increasingly important in providing a quality customer experience. 64% of SREs mention too many data sources; 54% find it difficult to find the most relevant indicators for a service and 36% point to the inability of monitoring tools to easily define and monitor SLOs. Respondents also note difficulties in managing and evaluating SLOs: the first is the functioning in silos of teams and tools (cited by 68%), followed by the growing complexity of applications, which translates into gray areas ( 59%). Finally, 52% mention an inability to correlate performance indicators with user experience.
For respondents, one of the key issues in extending SRE practices lies in the increasing use of automation. Indeed, in terms of tools, they mostly use in-house solutions (66%), which are difficult to scale. Today, they rely on automation to reduce security vulnerabilities (61%) and application failures through self-remediation (57%), to accelerate the rate of delivery (56%) and to predict breaches of SLOs before they occur (55%). Artificial intelligence represents another lever deemed promising, with 68% of SREs indicating that they are developing the use of AIOps technologies. They believe these will allow teams to automate more critical processes to ensure service levels are continuously met (64%). AIOps will also help prioritize issues with the greatest impact on user satisfaction (63%), as well as security vulnerabilities to minimize downtime (62%). Finally, it is a way to free up time, to make better use of the capacities of the operating teams (62%).
Article written by
Aurlie ChandezeCIO Deputy Editor-in-Chief
Follow the author on Linked In,
Share this article
We would love to say thanks to the writer of this article for this awesome material
The role of SREs, engineers in charge of site reliability, is now better understood
Take a look at our social media accounts and other pages related to themhttps://www.ai-magazine.com/related-pages/