![Machine Learning Engineering in Action](http://img.images-bn.com/static/redesign/srcs/images/grey-box.png?v11.8.5)
![Machine Learning Engineering in Action](http://img.images-bn.com/static/redesign/srcs/images/grey-box.png?v11.8.5)
eBook
Available on Compatible NOOK devices, the free NOOK App and in My Digital Library.
Related collections and offers
Overview
In Machine Learning Engineering in Action, you will learn:
Evaluating data science problems to find the most effective solution
Scoping a machine learning project for usage expectations and budget
Process techniques that minimize wasted effort and speed up production
Assessing a project using standardized prototyping work and statistical validation
Choosing the right technologies and tools for your project
Making your codebase more understandable, maintainable, and testable
Automating your troubleshooting and logging practices
Ferrying a machine learning project from your data science team to your end users is no easy task. Machine Learning Engineering in Action will help you make it simple. Inside, you'll find fantastic advice from veteran industry expert Ben Wilson, Principal Resident Solutions Architect at Databricks.
Ben introduces his personal toolbox of techniques for building deployable and maintainable production machine learning systems. You'll learn the importance of Agile methodologies for fast prototyping and conferring with stakeholders, while developing a new appreciation for the importance of planning. Adopting well-established software development standards will help you deliver better code management, and make it easier to test, scale, and even reuse your machine learning code. Every method is explained in a friendly, peer-to-peer style and illustrated with production-ready source code.
About the technology
Deliver maximum performance from your models and data. This collection of reproducible techniques will help you build stable data pipelines, efficient application workflows, and maintainable models every time. Based on decades of good software engineering practice, machine learning engineering ensures your ML systems are resilient, adaptable, and perform in production.
About the book
Machine Learning Engineering in Action teaches you core principles and practices for designing, building, and delivering successful machine learning projects. You'll discover software engineering techniques like conducting experiments on your prototypes and implementing modular design that result in resilient architectures and consistent cross-team communication. Based on the author's extensive experience, every method in this book has been used to solve real-world projects.
What's inside
Scoping a machine learning project for usage expectations and budget
Choosing the right technologies for your design
Making your codebase more understandable, maintainable, and testable
Automating your troubleshooting and logging practices
About the reader
For data scientists who know machine learning and the basics of object-oriented programming.
About the author
Ben Wilson is Principal Resident Solutions Architect at Databricks, where he developed the Databricks Labs AutoML project, and is an MLflow committer.
Product Details
ISBN-13: | 9781638356585 |
---|---|
Publisher: | Manning |
Publication date: | 05/17/2022 |
Sold by: | SIMON & SCHUSTER |
Format: | eBook |
Pages: | 576 |
File size: | 28 MB |
Note: | This product may take a few minutes to download. |
About the Author
Table of Contents
Preface xi
Acknowledgments xiii
About this book xv
About the author xviii
About the cover illustration xix
Part 1 An introduction to machine learning engineering 1
1 What is a machine learning engineer? 3
1.1 Why ML engineering? 5
1.2 The core tenets of ML engineering 8
Planning 8
Scoping and research 10
Experimentation 13
Development 15
Deployment 18
Evaluation 21
1.3 The goals of ML engineering 24
2 Your data science could use some engineering 26
2.1 Augmenting a complex profession with processes to increase project success 27
2.2 A foundation of simplicity 29
2.3 Co-opting principles of Agile software engineering 31
Communication and cooperation 33
Embracing and expecting change 35
2.4 The foundation of ML engineering 35
3 Before you model: Planning and scoping a project 38
3.1 Planning: You want me to predict what?! 42
Basic planning for a project 47
That first meeting 53
Plan for demos-lots of demos 56
Experimentation by solution building: Wasting time for pride's sake 58
3.2 Experimental scoping: Setting expectations and boundaries 60
What is experimental scoping? 61
Experimental scoping for the ML team: Research 62
Experimental scoping for the ML team: Experimentation 64
4 Before you model: Communication and logistics of projects 76
4.1 Communication: Defining the problem 79
Understanding the problem 80
Setting critical discussion boundaries 94
4.2 Don't waste our time: Meeting with cross-functional teams 101
Experimental update meeting: Do we know what we're doing here? 102
SME review/prototype review: Can we solve this? 103
Development progress review(s): Is this thing going to work? 105
MVP review: Did you build what we asked for? 106
Preproduction review: We really hope we didn't screw this up 107
4.3 Setting limits on your experimentation 108
Set a time limit 109
Can you put this into production? Would you want to maintain it? 112
TDD vs. RDD vs. FDD vs. CDD for ML projects 113
4.4 Planning for business rules chaos 116
Embracing chaos by planning for it 117
Human-w-the-loop design 119
What's your backup plan? 119
4.5 Talking about results 120
5 Experimentation in action: Planning and researching an ML project 124
5.1 Planning experiments 126
Perform basic research and planning 126
Forget the blogs-read the API docs 130
Draw straws for an internal hackathon 135
Level the playing field 136
5.2 Performing experimental prep work 137
Performing data analysis 139
Moving from script to reusable code 146
One last note on building reusable code for experimentation 154
6 Experimentation in action: Testing and evaluating a project 159
6.1 Testing ideas 162
Setting guidelines in code 163
Running quick forecasting tests 172
Whittling down the possibilities 190
Evaluating prototypes properly 191
Making a call on the direction to go in 193
So … what's next? 196
7 Experimentation in action: Moving from prototype to MVP 197
7.1 Tuning: Automating the annoying stuff 199
Tuning options 201
Hyperopt primer 206
Using Hyperopt to tune a complex forecasting problem 208
7.2 Choosing the right tech for the platform and the team 215
Why Spark? 216
Handling tuning from the driver with Spark Trials 218
Handling tuning from the workers with a pandas_udf 222
Using new paradigms for teams: Platforms and technologies 226
8 Experimentation in action: Finalizing an MVP with MLflow and runtime optimization 228
8.1 Logging: Code, metrics, and results 229
MLflow tracking 230
Please stop printing and log your information 232
Version control, branch strategies, and working with others 234
8.2 Scalability and concurrency 237
What is concurrency? 239
What you can (and can't) run asynchronously 239
Part 2 Preparing for production: Creating maintainable ML 243
9 Modularity for ML: Writing testable and legible code 245
9.1 Understanding monolithic scripts and why they are bad 248
How monoliths come into being 249
Walls of text 249
Considerations for monolithic scripts 252
9.2 Debugging walls of text 255
9.3 Designing modular ML code 257
9.4 Using test-driven development for ML 264
10 Standards of coding and creating maintainable ML code 269
10.1 ML code smells 270
10.2 Naming, structure, and code architecture 273
Naming conventions and structure 273
Trying to be too clever 274
Code architecture 276
10.3 Tuple unpacking and maintainable alternatives 278
Tuple, unpacking example 278
A solid alternative to tuple unpacking 280
10.4 Blind to issues: Eating exceptions and other bad practices 282
Try/catch with the precision of a shotgun 283
Exception handling with laser precision 285
Handling errors the right way 286
10.5 Use of global mutable objects 288
How mutability can burn you 288
Encapsulation to prevent mutable side effects 290
10.6 Excessively nested logic 292
11 Model measurement and why it's so important 300
11.1 Measuring model attribution 302
Measuring prediction performance 302
Clarifying correlation vs. causation 312
11.2 Leveraging A/B testing for attribution calculations 316
A/B testing 101 317
Evaluating continuous metrics 319
Using alternative displays and tests 325
Evaluating categorical metrics 329
12 Holding on to your gains by watching for drift 334
12.1 Detecting drift 335
What influences drift? 336
12.2 Responding to drift 347
What can we do about it? 348
Responding to drift 350
13 ML development hubris 353
13.1 Elegant complexity vs. overengineering 355
Lightweight scripted style (imperative) 357
An overengineered mess 361
13.2 Unintentional obfuscation: Could you read this if you didn't write it? 364
The flavors of obfuscation 365
Troublesome coding habits recap 378
13.3 Premature generalization, premature optimization, and other bad ways to show how smart you are 379
Generalization and frameworks: Avoid them until you cant 379
Optimizing too early 382
13.4 Do you really want to be the canary? Alpha testing and the dangers of the open source coal mine 390
13.5 Technology-driven development vs. solution-driven development 393
Part 3 Developing production machine learning code 397
14 Writing production code 399
14.1 Have you met your data? 401
Make sure you have the data 403
Check your data provenance 404
Find a source of truth and align on it 408
Don't embed data cleansing into your production code 410
14.2 Monitoring your features 412
14.3 Monitoring everything else in the model life cycle 417
14.4 Keeping things as simple as possible 421
Simplicity in problem, definitions 423
Simplicity in implementation 424
14.5 Wireframing ML projects 426
14.6 Avoiding cargo cult ML behavior 432
15 Quality and acceptance testing 438
15.1 Data consistency 439
Training and inference skew 440
A brief intro to feature stores 441
Process over technology 442
The dangers of a data silo 445
15.2 Fallbacks and cold starts 447
Leaning heavily on prior art 448
Cold-start woes 450
15.3 End user vs. internal use testing 453
Biased testing 456
Dogfooding 457
SME evaluation 459
15.4 Model interpretability 460
Shapley additive explanations 461
Using shap 463
16 Production infrastructure 471
16.1 Artifact management 472
Ml flow's model registry 474
Interfacing with the model registry 476
16.2 Feature stores 482
What a feature store is used for 483
Using a feature store 485
Evaluating a feature store 489
16.3 Prediction serving architecture 490
Determining serving needs 493
Bulk external delivery 500
Microbatch streaming 502
Real-time server-side 503
Integrated models (edge deployment) 507
Appendix A Big O(no) and how to think about runtime performance 510
Appendix B Setting up a development environment 540
Index 547