Database Reliability Engineering: Designing and Operating Resilient Database Systems

The infrastructure-as-code revolution in IT is also affecting database administration. With this practical book, developers, system administrators, and junior to mid-level DBAs will learn how the modern practice of site reliability engineering applies to the craft of database architecture and operations. Authors Laine Campbell and Charity Majors provide a framework for professionals looking to join the ranks of today’s database reliability engineers (DBRE).

You’ll begin by exploring core operational concepts that DBREs need to master. Then you’ll examine a wide range of database persistence options, including how to implement key technologies to provide resilient, scalable, and performant data storage and retrieval. With a firm foundation in database reliability engineering, you’ll be ready to dive into the architecture and operations of any modern database.

This book covers:

  • Service-level requirements and risk management
  • Building and evolving an architecture for operational visibility
  • Infrastructure engineering and infrastructure management
  • How to facilitate the release management process
  • Data storage, indexing, and replication
  • Identifying datastore characteristics and best use cases
  • Datastore architectural components and data-driven architectures
1123868710
Database Reliability Engineering: Designing and Operating Resilient Database Systems

The infrastructure-as-code revolution in IT is also affecting database administration. With this practical book, developers, system administrators, and junior to mid-level DBAs will learn how the modern practice of site reliability engineering applies to the craft of database architecture and operations. Authors Laine Campbell and Charity Majors provide a framework for professionals looking to join the ranks of today’s database reliability engineers (DBRE).

You’ll begin by exploring core operational concepts that DBREs need to master. Then you’ll examine a wide range of database persistence options, including how to implement key technologies to provide resilient, scalable, and performant data storage and retrieval. With a firm foundation in database reliability engineering, you’ll be ready to dive into the architecture and operations of any modern database.

This book covers:

  • Service-level requirements and risk management
  • Building and evolving an architecture for operational visibility
  • Infrastructure engineering and infrastructure management
  • How to facilitate the release management process
  • Data storage, indexing, and replication
  • Identifying datastore characteristics and best use cases
  • Datastore architectural components and data-driven architectures
32.49 In Stock
Database Reliability Engineering: Designing and Operating Resilient Database Systems

Database Reliability Engineering: Designing and Operating Resilient Database Systems

Database Reliability Engineering: Designing and Operating Resilient Database Systems

Database Reliability Engineering: Designing and Operating Resilient Database Systems

eBook

$32.49  $42.99 Save 24% Current price is $32.49, Original price is $42.99. You Save 24%.

Available on Compatible NOOK devices, the free NOOK App and in My Digital Library.
WANT A NOOK?  Explore Now

Related collections and offers


Overview

The infrastructure-as-code revolution in IT is also affecting database administration. With this practical book, developers, system administrators, and junior to mid-level DBAs will learn how the modern practice of site reliability engineering applies to the craft of database architecture and operations. Authors Laine Campbell and Charity Majors provide a framework for professionals looking to join the ranks of today’s database reliability engineers (DBRE).

You’ll begin by exploring core operational concepts that DBREs need to master. Then you’ll examine a wide range of database persistence options, including how to implement key technologies to provide resilient, scalable, and performant data storage and retrieval. With a firm foundation in database reliability engineering, you’ll be ready to dive into the architecture and operations of any modern database.

This book covers:

  • Service-level requirements and risk management
  • Building and evolving an architecture for operational visibility
  • Infrastructure engineering and infrastructure management
  • How to facilitate the release management process
  • Data storage, indexing, and replication
  • Identifying datastore characteristics and best use cases
  • Datastore architectural components and data-driven architectures

Product Details

ISBN-13: 9781491926215
Publisher: O'Reilly Media, Incorporated
Publication date: 10/26/2017
Sold by: Barnes & Noble
Format: eBook
Pages: 294
File size: 6 MB

About the Author

Laine Campbell is a co-founder at Pythian, where she leads the open source database practice. She specializes in database architecture and operations, particularly MySQL and Cassandra. Laine founded and led PalominoDB, then Blackbird for 8 years, where her team of DBAs supported many of the most exciting database infrastructures in the industry. Before that, she designed, built and supported the Travelocity databases for 8 years with a remarkable team. She lives in Las Vegas, and travels extensively.


Charity Majors is a Production Engineering Manager working on Parse at Facebook, with over 10 years of experience running operations for some of the world's fastest-growing tech startups. Before Parse she built infrastructure and teams for a series of startups, including Linden Lab and Shopkick. She loves hard scaling problems of all kinds, both technical and cultural, but always seems to end up doing stuff with databases. Rants a lot about ops, devops, performance tuning and culture. Charity lives in San Francisco and is never, ever leaving.

Table of Contents

Foreword xi

Preface xiii

1 Introducing Database Reliability Engineering 1

Guiding Principles of the DBRE 2

Protect the Data 2

Self-Service for Scale 3

Elimination of Toil 4

Databases Are Not Special Snowflakes 5

Eliminate the Barriers Between Software and Operations 5

Operations Core Overview 6

Hierarchy of Needs 7

Survival and Safety 7

Love and Belonging 8

Esteem 9

Self-actualization 10

Wrapping Up 11

2 Service-Level Management 13

Why Do I Need Service-Level Objectives? 13

Service-Level Indicators 15

Latency 15

Availability 16

Throughput 16

Durability 16

Cost or Efficiency 16

Defining Service Objectives 17

Latency Indicators 17

Availability Indicators 20

Throughput Indicators 23

Monitoring and Reporting on SLOs 25

Monitoring Availability 25

Monitoring Latency 28

Monitoring Throughput 28

Monitoring Cost and Efficiency 28

Wrapping Up 29

3 Risk Management 31

Risk Considerations 32

Unknown Factors and Complexity 32

Availability of Resources 33

Human Factors 33

Group Factors 34

What Do We Do? 35

What Not to Do 35

A Working Process: Bootstrapping 37

Service Risk Evaluation 38

Architectural Inventory 40

Prioritization 41

Control and Decision Making 43

Ongoing Iterations 46

Wrapping Up 48

4 Operational Visibility 49

The New Rules of Operational Visibility 51

Treat Op Viz Systems Like BI Systems 52

Distributed Ephemeral Environments Trending to the Norm 52

Store at High Resolutions for Key Metrics 54

Keep Your Architecture Simple 55

An Op Viz Framework 56

Data In 57

Telemetry/Metrics 59

Events 60

Logs 60

Data Out 60

Bootstrapping Your Monitoring 61

Is the Data Safe? 63

Is the Service Up? 64

Are the Consumers in Pain? 65

Instrumenting the Application 66

Distributed Tracing 66

Events and Logs 68

Instrumenting the Server or Instance 68

Events and Logs 70

Instrumenting the Datastore 71

Datastore Connection Layer 71

Utilization 71

Saturation 72

Errors 73

Internal Database Visibility 74

Throughput and Latency Metrics 74

Commits, Redo, and Journaling 75

Replication State 75

Memory Structures 76

Locking and Concurrency 77

Database Objects 78

Database Queries 79

Database Asserts and Events 79

Wrapping Up 80

5 Infrastructure Engineering 81

Hosts 81

Physical Servers 81

Operating a System and Kernel 82

Storage Area Networks 92

Benefits of Physical Servers 92

Cons of Physical Servers 92

Virtualization 93

Hypervisor 93

Concurrency 94

Storage 94

Use Cases 94

Containers 95

Database as a Service 95

Challenges of DBaaS 96

The DBRE and the DBaaS 96

Wrapping Up 97

6 Infrastructure Management 99

Version Control 100

Configuration Definition 101

Building from Configuration 103

Maintaining Configuration 104

Enforcement of Configuration Definitions 105

Infrastructure Definition and Orchestration 105

Monolithic Infrastructure Definitions 106

Separating Vertically 107

Separated Tiers (Horizontal Definitions) 108

Acceptance Testing and Compliance 109

Service Catalog 109

Bringing It All Together 110

Development Environments 111

Wrapping Up 112

7 Backup and Recovery 113

Core Concepts 114

Physical versus Logical 114

Online versus Offline 114

Full, Incremental, and Differential 115

Considerations for Recovery 115

Recovery Scenarios 116

Planned Recovery Scenarios 116

Unplanned Scenarios 118

Scenario scope 121

Scenario Impact 121

Anatomy of a Recovery Strategy 122

Building Block 1 Detection 122

Building Block 2 Tiered Storage 124

Building Block 3 A Varied Toolbox 125

Building Block 4 Testing 127

A Recovery Strategy Defined 128

Online, Fast Storage with Full and Incremental Backups 128

Online, Slow Storage with Full and Incremental Backups 129

Offline Storage 130

Object Storage 131

Wrapping Up 132

8 Release Management 133

Education and Collaboration 133

Become a Funnel 134

Foster Conversations 134

Domain-Specific Knowledge 135

Collaboration 137

Integration 138

Prerequisites 139

Testing 141

Test-Friendly Development Practices 142

Post-Commit Testing 143

Full Dataset Testing 144

Downstream Tests 145

Operational Tests 145

Deployment 146

Migrations and Versioning 146

Impact Analysis 147

Migration Patterns 148

Manual or Automated 151

Wrapping Up 151

9 Security 153

The Purpose of Security 153

Protecting Data from Theft 154

Protecting from Purposeful Damage 154

Protecting from Accidental Damage 154

Protecting Data from Exposure 155

Compliance and Auditing Standards 155

Database Security as a Function 155

Education and Collaboration 155

Self-Service 156

Integration and Testing 157

Operational Visibility 158

Vulnerabilities and Exploits 160

Stride 160

Dread 161

Basic Precautions 162

Denial of Service 163

SQL Injection 166

Network and Authentication Protocols 168

Encryption of Data 168

Financial Data 169

Personal Health Data 169

Private Individual Data 170

Military or Government Data 170

Confidential/Sensitive Business Data 170

Data in Transit 171

Data in the Database 174

Data in the Filesystem 177

Wrapping Up 179

10 Data Storage, Indexing, and Replication 181

Data Structure Storage 181

Database Row Storage 182

Sorted-String Tables and Log-Structured Merge Trees 185

Indexing 188

Logs and Databases 189

Data Replication 189

Single-Leader 190

Multi-Leader Replication 203

Wrapping Up 209

11 Datastore Field Guide 211

Conceptual Attributes of a Datastore 212

The Data Model 212

Transactions 216

Base 221

Internal Attributes of a Datastore 222

Storage 222

The Ubiquitous CAP Theorem Section 223

Consistency Latency Trade-offs 225

Availability 226

Wrapping Up 228

12 A Data Architecture Sampler 229

Architectural Components 229

Frontend Datastores 229

Data Access Layer 230

Database Proxies 231

Event and Message Systems 233

Caches and Memory Stores 235

Data Architectures 238

Lambda and Kappa 238

Event Sourcing 241

CQRS 242

Wrapping Up 243

13 Making the Case For DBRE 245

A Culture of Database Reliability 246

Breaking-Down Barriers 246

Data-Driven Decision Making 251

Data Integrity and Recoverability 252

Wrapping Up 252

Index 253

From the B&N Reads Blog

Customer Reviews