Machine Learning Engineering in Action

Machine Learning Engineering in Action

by Ben Wilson
Machine Learning Engineering in Action

Machine Learning Engineering in Action

by Ben Wilson

eBook

$43.99 

Available on Compatible NOOK devices, the free NOOK App and in My Digital Library.
WANT A NOOK?  Explore Now

Related collections and offers


Overview

Field-tested tips, tricks, and design patterns for building machine learning projects that are deployable, maintainable, and secure from concept to production.

In Machine Learning Engineering in Action, you will learn:

Evaluating data science problems to find the most effective solution
Scoping a machine learning project for usage expectations and budget
Process techniques that minimize wasted effort and speed up production
Assessing a project using standardized prototyping work and statistical validation
Choosing the right technologies and tools for your project
Making your codebase more understandable, maintainable, and testable
Automating your troubleshooting and logging practices

Ferrying a machine learning project from your data science team to your end users is no easy task. Machine Learning Engineering in Action will help you make it simple. Inside, you'll find fantastic advice from veteran industry expert Ben Wilson, Principal Resident Solutions Architect at Databricks.

Ben introduces his personal toolbox of techniques for building deployable and maintainable production machine learning systems. You'll learn the importance of Agile methodologies for fast prototyping and conferring with stakeholders, while developing a new appreciation for the importance of planning. Adopting well-established software development standards will help you deliver better code management, and make it easier to test, scale, and even reuse your machine learning code. Every method is explained in a friendly, peer-to-peer style and illustrated with production-ready source code.

About the technology
Deliver maximum performance from your models and data. This collection of reproducible techniques will help you build stable data pipelines, efficient application workflows, and maintainable models every time. Based on decades of good software engineering practice, machine learning engineering ensures your ML systems are resilient, adaptable, and perform in production.

About the book
Machine Learning Engineering in Action teaches you core principles and practices for designing, building, and delivering successful machine learning projects. You'll discover software engineering techniques like conducting experiments on your prototypes and implementing modular design that result in resilient architectures and consistent cross-team communication. Based on the author's extensive experience, every method in this book has been used to solve real-world projects.

What's inside

Scoping a machine learning project for usage expectations and budget
Choosing the right technologies for your design
Making your codebase more understandable, maintainable, and testable
Automating your troubleshooting and logging practices

About the reader
For data scientists who know machine learning and the basics of object-oriented programming.

About the author
Ben Wilson is Principal Resident Solutions Architect at Databricks, where he developed the Databricks Labs AutoML project, and is an MLflow committer.

Product Details

ISBN-13: 9781638356585
Publisher: Manning
Publication date: 05/17/2022
Sold by: SIMON & SCHUSTER
Format: eBook
Pages: 576
File size: 28 MB
Note: This product may take a few minutes to download.

About the Author

Ben Wilson has worked as a professional data scientist for more than ten years. He currently works as a resident solutions architect at Databricks, where he focuses on machine learning production architecture with companies ranging from 5-person startups to global Fortune 100. Ben is the creator and lead developer of the Databricks Labs AutoML project, a Scala-and Python-based toolkit that simplifies machine learning feature engineering, model tuning, and pipeline-enabled modeling.

Table of Contents

Preface xi

Acknowledgments xiii

About this book xv

About the author xviii

About the cover illustration xix

Part 1 An introduction to machine learning engineering 1

1 What is a machine learning engineer? 3

1.1 Why ML engineering? 5

1.2 The core tenets of ML engineering 8

Planning 8

Scoping and research 10

Experimentation 13

Development 15

Deployment 18

Evaluation 21

1.3 The goals of ML engineering 24

2 Your data science could use some engineering 26

2.1 Augmenting a complex profession with processes to increase project success 27

2.2 A foundation of simplicity 29

2.3 Co-opting principles of Agile software engineering 31

Communication and cooperation 33

Embracing and expecting change 35

2.4 The foundation of ML engineering 35

3 Before you model: Planning and scoping a project 38

3.1 Planning: You want me to predict what?! 42

Basic planning for a project 47

That first meeting 53

Plan for demos-lots of demos 56

Experimentation by solution building: Wasting time for pride's sake 58

3.2 Experimental scoping: Setting expectations and boundaries 60

What is experimental scoping? 61

Experimental scoping for the ML team: Research 62

Experimental scoping for the ML team: Experimentation 64

4 Before you model: Communication and logistics of projects 76

4.1 Communication: Defining the problem 79

Understanding the problem 80

Setting critical discussion boundaries 94

4.2 Don't waste our time: Meeting with cross-functional teams 101

Experimental update meeting: Do we know what we're doing here? 102

SME review/prototype review: Can we solve this? 103

Development progress review(s): Is this thing going to work? 105

MVP review: Did you build what we asked for? 106

Preproduction review: We really hope we didn't screw this up 107

4.3 Setting limits on your experimentation 108

Set a time limit 109

Can you put this into production? Would you want to maintain it? 112

TDD vs. RDD vs. FDD vs. CDD for ML projects 113

4.4 Planning for business rules chaos 116

Embracing chaos by planning for it 117

Human-w-the-loop design 119

What's your backup plan? 119

4.5 Talking about results 120

5 Experimentation in action: Planning and researching an ML project 124

5.1 Planning experiments 126

Perform basic research and planning 126

Forget the blogs-read the API docs 130

Draw straws for an internal hackathon 135

Level the playing field 136

5.2 Performing experimental prep work 137

Performing data analysis 139

Moving from script to reusable code 146

One last note on building reusable code for experimentation 154

6 Experimentation in action: Testing and evaluating a project 159

6.1 Testing ideas 162

Setting guidelines in code 163

Running quick forecasting tests 172

Whittling down the possibilities 190

Evaluating prototypes properly 191

Making a call on the direction to go in 193

So … what's next? 196

7 Experimentation in action: Moving from prototype to MVP 197

7.1 Tuning: Automating the annoying stuff 199

Tuning options 201

Hyperopt primer 206

Using Hyperopt to tune a complex forecasting problem 208

7.2 Choosing the right tech for the platform and the team 215

Why Spark? 216

Handling tuning from the driver with Spark Trials 218

Handling tuning from the workers with a pandas_udf 222

Using new paradigms for teams: Platforms and technologies 226

8 Experimentation in action: Finalizing an MVP with MLflow and runtime optimization 228

8.1 Logging: Code, metrics, and results 229

MLflow tracking 230

Please stop printing and log your information 232

Version control, branch strategies, and working with others 234

8.2 Scalability and concurrency 237

What is concurrency? 239

What you can (and can't) run asynchronously 239

Part 2 Preparing for production: Creating maintainable ML 243

9 Modularity for ML: Writing testable and legible code 245

9.1 Understanding monolithic scripts and why they are bad 248

How monoliths come into being 249

Walls of text 249

Considerations for monolithic scripts 252

9.2 Debugging walls of text 255

9.3 Designing modular ML code 257

9.4 Using test-driven development for ML 264

10 Standards of coding and creating maintainable ML code 269

10.1 ML code smells 270

10.2 Naming, structure, and code architecture 273

Naming conventions and structure 273

Trying to be too clever 274

Code architecture 276

10.3 Tuple unpacking and maintainable alternatives 278

Tuple, unpacking example 278

A solid alternative to tuple unpacking 280

10.4 Blind to issues: Eating exceptions and other bad practices 282

Try/catch with the precision of a shotgun 283

Exception handling with laser precision 285

Handling errors the right way 286

10.5 Use of global mutable objects 288

How mutability can burn you 288

Encapsulation to prevent mutable side effects 290

10.6 Excessively nested logic 292

11 Model measurement and why it's so important 300

11.1 Measuring model attribution 302

Measuring prediction performance 302

Clarifying correlation vs. causation 312

11.2 Leveraging A/B testing for attribution calculations 316

A/B testing 101 317

Evaluating continuous metrics 319

Using alternative displays and tests 325

Evaluating categorical metrics 329

12 Holding on to your gains by watching for drift 334

12.1 Detecting drift 335

What influences drift? 336

12.2 Responding to drift 347

What can we do about it? 348

Responding to drift 350

13 ML development hubris 353

13.1 Elegant complexity vs. overengineering 355

Lightweight scripted style (imperative) 357

An overengineered mess 361

13.2 Unintentional obfuscation: Could you read this if you didn't write it? 364

The flavors of obfuscation 365

Troublesome coding habits recap 378

13.3 Premature generalization, premature optimization, and other bad ways to show how smart you are 379

Generalization and frameworks: Avoid them until you cant 379

Optimizing too early 382

13.4 Do you really want to be the canary? Alpha testing and the dangers of the open source coal mine 390

13.5 Technology-driven development vs. solution-driven development 393

Part 3 Developing production machine learning code 397

14 Writing production code 399

14.1 Have you met your data? 401

Make sure you have the data 403

Check your data provenance 404

Find a source of truth and align on it 408

Don't embed data cleansing into your production code 410

14.2 Monitoring your features 412

14.3 Monitoring everything else in the model life cycle 417

14.4 Keeping things as simple as possible 421

Simplicity in problem, definitions 423

Simplicity in implementation 424

14.5 Wireframing ML projects 426

14.6 Avoiding cargo cult ML behavior 432

15 Quality and acceptance testing 438

15.1 Data consistency 439

Training and inference skew 440

A brief intro to feature stores 441

Process over technology 442

The dangers of a data silo 445

15.2 Fallbacks and cold starts 447

Leaning heavily on prior art 448

Cold-start woes 450

15.3 End user vs. internal use testing 453

Biased testing 456

Dogfooding 457

SME evaluation 459

15.4 Model interpretability 460

Shapley additive explanations 461

Using shap 463

16 Production infrastructure 471

16.1 Artifact management 472

Ml flow's model registry 474

Interfacing with the model registry 476

16.2 Feature stores 482

What a feature store is used for 483

Using a feature store 485

Evaluating a feature store 489

16.3 Prediction serving architecture 490

Determining serving needs 493

Bulk external delivery 500

Microbatch streaming 502

Real-time server-side 503

Integrated models (edge deployment) 507

Appendix A Big O(no) and how to think about runtime performance 510

Appendix B Setting up a development environment 540

Index 547

From the B&N Reads Blog

Customer Reviews