Seeking SRE: Conversations About Running Production Systems at Scale

Organizations big and small have started to realize just how crucial system and application reliability is to their business. Theyâ??ve also learned just how difficult it is to maintain that reliability while iterating at the speed demanded by the marketplace. Site Reliability Engineering (SRE) is a proven approach to this challenge.

SRE is a large and rich topic to discuss. Google led the way with Site Reliability Engineering, the wildly successful Oâ??Reilly book that described Googleâ??s creation of the discipline and the implementation thatâ??s allowed them to operate at a planetary scale. Inspired by that earlier work, this book explores a very different part of the SRE space. The more than two dozen chapters in Seeking SRE bring you into some of the important conversations going on in the SRE world right now.

Listen as engineers and other leaders in the field discuss:

  • Different ways of implementing SRE and SRE principles in a wide variety of settings
  • How SRE relates to other approaches such as DevOps
  • Specialties on the cutting edge that will soon be commonplace in SRE
  • Best practices and technologies that make practicing SRE easier
  • The important but rarely explored human side of SRE

David N. Blank-Edelman is the bookâ??s curator and editor.

"1126958644"
Seeking SRE: Conversations About Running Production Systems at Scale

Organizations big and small have started to realize just how crucial system and application reliability is to their business. Theyâ??ve also learned just how difficult it is to maintain that reliability while iterating at the speed demanded by the marketplace. Site Reliability Engineering (SRE) is a proven approach to this challenge.

SRE is a large and rich topic to discuss. Google led the way with Site Reliability Engineering, the wildly successful Oâ??Reilly book that described Googleâ??s creation of the discipline and the implementation thatâ??s allowed them to operate at a planetary scale. Inspired by that earlier work, this book explores a very different part of the SRE space. The more than two dozen chapters in Seeking SRE bring you into some of the important conversations going on in the SRE world right now.

Listen as engineers and other leaders in the field discuss:

  • Different ways of implementing SRE and SRE principles in a wide variety of settings
  • How SRE relates to other approaches such as DevOps
  • Specialties on the cutting edge that will soon be commonplace in SRE
  • Best practices and technologies that make practicing SRE easier
  • The important but rarely explored human side of SRE

David N. Blank-Edelman is the bookâ??s curator and editor.

38.49 In Stock
Seeking SRE: Conversations About Running Production Systems at Scale

Seeking SRE: Conversations About Running Production Systems at Scale

by David N. Blank-Edelman
Seeking SRE: Conversations About Running Production Systems at Scale

Seeking SRE: Conversations About Running Production Systems at Scale

by David N. Blank-Edelman

eBook

$38.49  $50.99 Save 25% Current price is $38.49, Original price is $50.99. You Save 25%.

Available on Compatible NOOK devices, the free NOOK App and in My Digital Library.
WANT A NOOK?  Explore Now

Related collections and offers


Overview

Organizations big and small have started to realize just how crucial system and application reliability is to their business. Theyâ??ve also learned just how difficult it is to maintain that reliability while iterating at the speed demanded by the marketplace. Site Reliability Engineering (SRE) is a proven approach to this challenge.

SRE is a large and rich topic to discuss. Google led the way with Site Reliability Engineering, the wildly successful Oâ??Reilly book that described Googleâ??s creation of the discipline and the implementation thatâ??s allowed them to operate at a planetary scale. Inspired by that earlier work, this book explores a very different part of the SRE space. The more than two dozen chapters in Seeking SRE bring you into some of the important conversations going on in the SRE world right now.

Listen as engineers and other leaders in the field discuss:

  • Different ways of implementing SRE and SRE principles in a wide variety of settings
  • How SRE relates to other approaches such as DevOps
  • Specialties on the cutting edge that will soon be commonplace in SRE
  • Best practices and technologies that make practicing SRE easier
  • The important but rarely explored human side of SRE

David N. Blank-Edelman is the bookâ??s curator and editor.


Product Details

ISBN-13: 9781491978818
Publisher: O'Reilly Media, Incorporated
Publication date: 08/21/2018
Sold by: Barnes & Noble
Format: eBook
Pages: 590
File size: 13 MB
Note: This product may take a few minutes to download.

About the Author

David N. Blank-Edelman is the Director of Technology at the Northeastern University College of Computer and Information Science. He has spent the last 25 years as a system/network administrator in large multi- platform environments, including Brandeis University, Cambridge Technology Group, and the MIT Media Laboratory. He was also the program chair of the LISA 2005 conference and one of the LISA 2006 Invited Talks co-chairs.

Table of Contents

Introduction ix

Part I SRE Implementation

1 Context Versus Control in SRE 3

2 Interviewing Site Reliability Engineers 15

3 So, You Want to Build an SRE Team? 25

4 Using Incident Metrics to Improve SRE at Scale 33

5 Working with Third Parties Shouldn't Suck 43

6 How to Apply SRE Principles Without Dedicated SRE Teams 65

7 SRE Without SRE: The Spotify Case Study 81

8 Introducing SRE in Large Enterprises 111

9 From SysAdmin to SRE in 8,963 Words 123

10 Clearing the Way for SRE in the Enterprise 147

11 SRE Patterns Loved by DevOps People Everywhere 177

12 DevOps and SRE: Voices from the Community 187

13 Production Engineering at Facebook 207

Part II Near Edge SRE

14 In the Beginning, There Was Chaos 233

15 The Intersection of Reliability and Privacy 245

16 Database Reliability Engineering 257

17 Engineering for Data Durability 275

18 Introduction to Machine Learning for SRE 293

Part III SRE Best Practices and Technologies

19 Do Docs Better: Integrating Documentation into the Engineering Workflow 325

20 Active Teaching and Learning 343

21 The Art and Science of the Service-Level Objective 355

22 SRE as a Success Culture 365

23 SRE Antipatterns 379

24 Immutable Infrastructure and SRE 407

25 Scriptable Load Balancers 415

26 The Service Mesh: Wrangler of Your Microservices? 433

Part IV The Human Side of SRE

27 Psychological Safety in SRE 453

28 SRE Cognitive Work 465

29 Beyond Burnout 491

30 Against On-Call: A Polemic 511

31 Elegy for Complex Systems 533

32 Intersections Between Operations and Social Activism 541

33 Conclusion 559

Index 561

From the B&N Reads Blog

Customer Reviews