Table of Contents
Acknowledgments xv
Preface xvii
Chapter 1 Introduction to Machine Learning and Deep Learning 1
1.1 A Brief History of AI and Machine Learning 1
1.2 Building Blocks of a Machine Learning Project 2
1.3 Machine Learning Algorithms vs. Traditional Computer Programs 4
1.4 How Deep Learning Works 4
1.5 Machine Learning and Deep Learning Applications 4
1.5.1 Applications in Daily Life 4
1.5.2 Machine Learning in Manufacturing 5
1.5.3 Applications in Robotics 5
1.5.4 Applications in Banking and Finance 5
1.5.5 Applications of Deep Learning 6
1.6 The Organization of This Book 6
1.7 Prerequisites-Essential Mathematics 8
1.8 The Terminology You Should Know 8
1.9 Machine Learning-A Wider Outlook Will Certainly Help 8
1.9.1 AI-The Current State 9
1.9.2 AI Future Road Map-How Disruptive Is It? 9
1.9.3 Ethical, Social, and Legal Issues Related to AI 10
1.10 Python and Its Potential as the Language of Machine Learning 11
1.11 About TensorFlow 11
1.12 Conclusion 11
1.13 References 12
Chapter 2 Basics of Python Programming and Statistics 13
2.1 Introduction to Python 14
2.1.1 Why Python? 14
2.1.2 Python Versions 14
2.1.3 Python IDEs 15
2.1.4 Installing Python 16
2.2 Getting Started with Python Coding 18
2.2.1 Working with Spyder IDE 18
2.2.2 First Few Lines of Code 19
2.2.3 Errors and Error Messages 21
2.2.4 Naming Conventions 22
2.2.5 Printing with a Message 23
2.3 Types of Objects in Python 23
2.3.1 Numbers 23
2.3.2 Strings 23
2.3.3 Working with Lists 25
2.3.4 Dictionaries 26
2.4 Python Packages 28
2.4.1 NumPy 30
2.4.2 Pandas 30
2.4.3 Matplotlib 31
2.4.4 ScikitLearn 32
2.4.5 Nltk 33
2.4.6 TensorFlow 33
2.5 Conditions and Loops in Python 33
2.6 Data Handling and Pandas Deep Dive 35
2.6.1 Data Importing and Basic Details 35
2.6.2 Subsets and Data Filters 38
2.6.3 Useful Pandas Commands 42
2.7 Basic Descriptive Statistics 44
2.7.1 Mean 44
2.7.2 Median 45
2.7.3 Variance and Standard Deviation 46
2.8 Data Exploration 48
2.8.1 Exploring Numeric Continuous Variables 49
2.8.2 Exploring Discrete and Categorical Variables 56
2.8.3 Exploring Other Variables 58
2.9 Conclusion 59
2.10 Practice Problems 59
2.11 References 60
Chapter 3 Regression and Logistic Regression 61
3.1 What Is Regression? 61
3.2 Regression Model Building 64
3.2.1 Finding the Regression Coefficients 64
3.2.2 Regression Line Example 69
3.3 R-Squared 72
3.4 Multiple Regression 75
3.5 Multicollinearity in Regression 77
3.5.1 What Is Multicollinearity? 80
3.5.2 Detection of Multicollinearity 81
3.5.3 Variance Inflation Factor Calculation 82
3.6 Individual Impact of the Variables in Regression 87
3.6.1 P-value 88
3.6.2 Theory Behind P-value 91
3.7 Steps Needed in Building a Regression Model 92
3.8 Logistic Regression Model 92
3.8.1 Logistic Function 95
3.9 Logistic Regression Model Building 97
3.10 Accuracy of Logistic Regression Line 99
3.10.1 Accuracy Calculations 100
3.11 Multiple Logistic Regression Line 102
3.12 Multicollinearity in Logistic Regression 106
3.13 Individual Impact of the Variables 108
3.14 Steps in Building a Logistic Regression Model 111
3.15 Linear vs. Logistic Regression Comparison 111
3.16 Conclusion 112
3.17 Practice Problems 112
3.18 Reference 113
Chapter 4 Decision Trees 115
4.1 What Are Decision Trees? 115
4.2 Splitting Criterion Metrics: Entropy and Information Gain 119
4.2.1 Entropy: The Measure of Impurity 120
4.2.2 Information Gain 121
4.2.3 Gini Index: An Alternative to Entropy 122
4.3 Decision Tree Algorithm 124
4.4 Case Study: Contact Center Customer Segmentation 127
4.4.1 Objective and Data Exploration 127
4.4.2 Model Building Code in Python 130
4.4.3 Drawing the Decision Tree 131
4.4.4 Tree Output Interpretation 132
4.4.5 Tree Validation and Accuracy 136
4.5 The Problem of Overfitting 137
4.5.1 Huge Decision Trees 137
4.5.2 The Train and Test Data 139
4.5.3 Overfitting 139
4.6 Pruning of Decision Trees 143
4.6.1 Max_Depth 144
4.7 The Challenge of Underfitting 145
4.8 Binary Search on Pruning Parameters 146
4.9 More Pruning Parameters 147
4.9.1 Maximum Leaf Nodes 148
4.9.2 Minimum Samples in the Leaf Node 149
4.10 Steps in Building a Decision Tree Model 149
4.11 Conclusion 150
4.12 Practice Problems 150
Chapter 5 Model Selection and Cross-Validation 151
5.1 Steps in Building a Model 151
5.2 Model Validation Measures: Regression 152
5.3 Case Study: House Sales in King County, Washington 153
5.3.1 Objective and Data 153
5.3.2 Model Building and Validation 156
5.4 Model Validation Measures: Classification 158
5.4.1 Confusion Matrix and Accuracy 158
5.4.2 Measures for Class Imbalance 160
5.5 Bias-Variance Trade-Off 178
5.5.1 The Problem of Overfitting: Variance 178
5.5.2 The Problem of Underfitting: Bias 178
5.5.3 Bias-Variance Trade-Off 179
5.5.4 Case Study: Pima Indians Diabetes Prediction 181
5.6 Cross-Validation 184
5.6.1 Cross-Validation: An Example 184
5.6.2 K-Fold Cross-Validation 185
5.6.3 Train-Validation-Holdout Cross-Validation Method 187
5.7 Feature Engineering Tips and Tricks 189
5.7.1 What Is Feature Engineering? 189
5.7.2 The Dummy Variable Creation or One Hot Encoding 192
5.7.3 Handling Longitude and Latitude 202
5.7.4 Handling Date Variables 208
5.7.5 Transformations 216
5.8 Dealing with Class Imbalance 220
5.8.1 Oversampling and Undersampling 221
5.9 Conclusion 224
5.10 Practice Problems 224
5.11 References 225
Chapter 6 Cluster Analysis 227
6.1 Unsupervised Learning 227
6.1.1 Cluster Analysis 228
6.1.2 Case Study: Customer Segmentation Wholesale Customers Data 228
6.2 Distance Measure 232
6.2.1 Euclidean Distance 233
6.2.2 Distance Matrix 234
6.3 K-Means Clustering Algorithm 236
6.3.1 Steps in Clustering Algorithm 236
6.3.2 K-Means Clustering Algorithm: Illustration 239
6.3.3 K-Means Clustering Algorithm: Output 240
6.4 Building K-Means Clusters 240
6.4.1 Wholesale Data Case Study Final Result 244
6.5 Deciding the Number of Clusters 246
6.5.1 Elbow Method 246
6.6 Conclusion 249
6.7 Practice Problems 249
6.8 References 249
Chapter 7 Random Forests and Boosting 251
7.1 Ensemble Models 251
7.1.1 Wisdom of Crowds 251
7.1.2 Ensemble Models Approach 252
7.2 Bagging 253
7.2.1 Bootstrap Sampling 253
7.2.2 Bagging Algorithm 254
7.3 Random Forest 255
7.3.1 Random Forest Algorithm 255
7.3.2 Hyperparameters in Random Forest 257
7.4 Case Study: Car Accidents Prediction 257
7.4.1 Background and Objective 257
7.4.2 Data Exploration 257
7.4.3 Model Building and Validation 260
7.5 Boosting 262
7.6 AdaBoosttng Algorithm 262
7.7 Gradient Boosting Algorithm 264
7.7.1 Gradient Boosting Algorithm 264
7.7.2 Gradient Boosting on Trees 264
7.7.3 Hyperparameters in Boosting 265
7.7.4 Gradient Boosting Illustration 266
7.8 Case Study: Income Prediction from Census Data 269
7.8.1 Background and Objective 269
7.8.2 Data Exploration 270
7.8.3 Data Cleaning and Feature Engineering 274
7.8.4 Model Building and Validation 276
7.9 Conclusion 279
7.10 Practice Problems 279
7.11 References 280
Chapter 8 Artificial Neural Networks 281
8.1 Network Diagram for Logistic Regression 281
8.2 Concept of Decision Boundary 283
8.2.1 Decision Boundary: Code 286
8.3 Multiple Decision Boundaries Problem 289
8.4 Multiple Decision Boundaries Solution 292
8.4.1 Building Intermediate Output Models 293
8.5 Neural Network Intuition 299
8.5.1 Hidden Layers and Hidden Nodes 299
8.6 Neural Network Algorithm 301
8.6.1 Neural Network Algorithm: Nontechnical 303
8.6.2 Neural Network Algorithm: Mathematical Formulas 304
8.6.3 Neural Network Algorithm: A Worked-Out Example 305
8.7 The Concept of Gradient Descent 309
8.7.1 Gradient Descent for Regression 310
8.7.2 Learning Rate 310
8.7.3 Code of Gradient Descent for Regression 311
8.7 A Multiple Solutions for a Problem 312
8.8 Case Study: Recognizing Handwritten Digits 313
8.8.1 Background and Objective 313
8.8.2 Data 315
8.8.3 Model Building 318
8.8.4 Model Predictions and Validation 322
8.9 Deep Neural Networks 324
8.10 Conclusion 325
8.11 Practice Problems 325
8.12 References 325
Chapter 9 TensorFlow and Keras 327
9.1 Deep Neural Networks 327
9.1.1 Number of Parameters 327
9.2 Deep Learning Frameworks 328
9.2.1 What Is TensorFlow? 329
9.2.2 Computational Graphs 329
9.2.3 Python Notebook 331
9.2.4 Installing TensorFlow 338
9.3 Key Terms in TensorFlow 339
9.3.1 Tensors 339
9.4 Model Building with TensorFlow 342
9.4.1 Building a Regression Model with TensorFlow 343
9.4.2 Logistic Regression Model Building with TensorFlow 346
9.5 Keras 349
9.5.1 What Is Keras? 349
9.5.2 Working with Keras 349
9.5.3 MNIST on Keras 349
9.6 Conclusion 352
9.7 References 352
Chapter 10 Deep Learning Hyperparameters 353
10.1 Regularization 353
10.1.1 Regularization in Regression 354
10.1.2 LI and L2 Regularization 357
10.1.3 Regularization in Neural Networks 358
10.1.4 LI and L2 Regularization Code 365
10.1.5 Data Standardization in L1 and L2 Regularization 367
10.2 Dropout Regularization 367
10.2.1 Dropout Method's Code 369
10.3 Early Stopping Method 371
10.4 Loss Functions 374
10.5 Activation Functions 376
10.5.1 Sigmoid 376
10.5.2 Tanh 377
10.5.3 ReLU Activation 380
10.5.4 Softmax 383
10.5.5 Code Activation Functions 384
10.6 Learning Rate 385
10.6.1 Learning Rate Demo 388
10.6.2 Learning Rate Code 392
10.6.3 Momentum 395
10.7 Optimizers 397
10.7.1 SGD-Stochastic Gradient Descent 397
10.7.2 Mini-Batch Gradient Descent 398
10.8 Conclusion 400
Chapter 11 Convolutional Neural Networks 401
11.1 ANNs for Images 402
11.1.1 Spatial Dependence 402
11.1.2 Number of Free Parameters in ANNs 405
11.2 Filters 405
11.2.1 How a Filter Works 405
11.2.2 Kernel Matrix for Detecting Features 408
11.2.3 Weights in the Kernel Matrix 413
11.3 The Convolution Layer 413
11.3.1 The Convolution Layer in Keras 414
11.3.2 Filters for Color Images 417
11.3.3 Zero Padding 420
11.3.4 Strides 426
11.4 Pooling Layer 428
11.4.1 How Pooling Works 428
11.4.2 Why Pooling Is Done 429
11.5 CNN Architecture 431
11.5.1 Weights in a CNN Model 432
11.5.2 CNN Code 434
11.6 Case Study: Sign Language Reading from Images 438
11.6.1 Background and Objective 438
11.6.2 Data 439
11.6.3 Model Building and Validation 441
11.7 Scheming the Ideal CNN Architecture 445
11.7.1 Number of Convolution and Pooling Layers 445
11.7.2 Number of Filters in the Convolution Layer 446
11.7.3 Batch Normalization 454
11.7.4 Choosing the Optimizers 460
11.8 Steps in Building a CNN Model 464
11.9 Conclusion 464
11.10 Practice Problems 464
11.11 References 465
Chapter 12 Recurrent Neural Networks and Long Short-Term Memory 467
12.1 Cross-Sectional Data vs. Sequential Data 467
12.1.1 Cross-Sectional Data 467
12.1.2 Sequential Data 467
12.2 Models for Sequential Data 467
12.2.1 ANN for Sequential Data 468
12.2.2 CNN for Sequential Data 468
12.2.3 Sequential ANN 469
12.3 Case Study: Word Prediction 471
12.3.1 Objective and Data 471
12.3.2 Data Preprocessing 473
12.3.3 Model Building 476
12.3.4 Prediction 479
12.4 Recurrent Neural Networks 479
12.4.1 Backpropagation Through Time 480
12.4.2 Calculating the Number of Parameters: An Example 483
12.4.3 RNN Model Building Code 485
12.4.4 Word Prediction Using RNN Model 486
12.5 RNN for Long Sequences 489
12.5.1 Case Study: Predicting the Characters to Form the Next Word 489
12.5.2 Problem of Vanishing Gradients 498
12.6 Long Short-Term Memory 499
12.6.1 LSTM Gates 500
12.6.2 LSTM Intuition 505
12.6.3 LSTM Case Study 506
12.7 Sequence to Sequence Models 509
12.7.1 Word2vec 510
12.8 Case Study: Language Translation 515
12.8.1 Objective and Data 515
12.8.2 Data Preprocessing 516
12.8.3 Encoder and Decoder 517
12.8.4 Model Building 518
12.8.5 Predictions Using the Model 519
12.9 Conclusion 523
12.10 Practice Problems 523
12.11 References 523
Index 525