Mathematics of Big Data: Spreadsheets, Databases, Matrices, and Graphs

Mathematics of Big Data: Spreadsheets, Databases, Matrices, and Graphs

ISBN-10:
0262038390
ISBN-13:
9780262038393
Pub. Date:
07/17/2018
Publisher:
MIT Press
ISBN-10:
0262038390
ISBN-13:
9780262038393
Pub. Date:
07/17/2018
Publisher:
MIT Press
Mathematics of Big Data: Spreadsheets, Databases, Matrices, and Graphs

Mathematics of Big Data: Spreadsheets, Databases, Matrices, and Graphs

Hardcover

$80.0
Current price is , Original price is $80.0. You
$80.00 
  • SHIP THIS ITEM
    Qualifies for Free Shipping
  • PICK UP IN STORE
    Check Availability at Nearby Stores

Overview

The first book to present the common mathematical foundations of big data analysis across a range of applications and technologies.

Today, the volume, velocity, and variety of data are increasing rapidly across a range of fields, including Internet search, healthcare, finance, social media, wireless devices, and cybersecurity. Indeed, these data are growing at a rate beyond our capacity to analyze them. The tools—including spreadsheets, databases, matrices, and graphs—developed to address this challenge all reflect the need to store and operate on data as whole sets rather than as individual elements. This book presents the common mathematical foundations of these data sets that apply across many applications and technologies. Associative arrays unify and simplify data, allowing readers to look past the differences among the various tools and leverage their mathematical similarities in order to solve the hardest big data challenges.

The book first introduces the concept of the associative array in practical terms, presents the associative array manipulation system D4M (Dynamic Distributed Dimensional Data Model), and describes the application of associative arrays to graph analysis and machine learning. It provides a mathematically rigorous definition of associative arrays and describes the properties of associative arrays that arise from this definition. Finally, the book shows how concepts of linearity can be extended to encompass associative arrays. Mathematics of Big Data can be used as a textbook or reference by engineers, scientists, mathematicians, computer scientists, and software engineers who analyze big data.


Product Details

ISBN-13: 9780262038393
Publisher: MIT Press
Publication date: 07/17/2018
Series: MIT Lincoln Laboratory Series
Pages: 448
Product dimensions: 7.00(w) x 9.10(h) x 1.30(d)
Age Range: 18 Years

About the Author

Jeremy Kepner is an MIT Lincoln Laboratory Fellow, Founder and Head of the MIT Lincoln Laboratory Supercomputing Center, and Research Affiliate in MIT's Mathematics Department.

Hayden Jananthan is a PhD candidate in the Department of Mathematics at Vanderbilt University.

Charles E. Leiserson is Professor of Computer Science and Engineering at the Massachusetts Institute of Technology.

Table of Contents

Foreword xi

Preface xiii

About the Authors xvii

About the Cover xix

Acknowledgments xxiii

I Applications and Practice 1

1 Introduction and Overview 3

1.1 Mathematics of Data 3

1.2 Data in the World 5

1.3 Mathematical Foundations 9

1.4 Making Data Rigorous 14

1.5 Conclusions, Exercises, and References 16

2 Perspectives on Data 19

2.1 Interrelations 19

2.2 Spreadsheets 20

2.3 Databases 22

2.4 Matrices 26

2.5 Graphs 27

2.6 Map Reduce 29

2.7 Other Perspectives 30

2.8 Conclusions, Exercises, and References 31

3 Dynamic Distributed Dimensional data Model 37

3.1 Background 37

3.2 Design 38

3.3 Matrix Mathematics 39

3.4 Common SQL, NoSQL, NewSQL Interface 40

3.5 Key-Value Store Database Schema 41

3.6 Data-Independent Analytics 44

3.7 Parallel Performance 49

3.8 Computing on Masked Data 51

3.9 Conclusions, Exercises, and References 53

4 Associative Arrays and Musical Metadata 57

4.1 Data and Metadata 57

4.2 Dense Data 58

4.3 Dense Operations 60

4.4 Sparse Data 62

4.5 Sparse Operations 63

4.6 Conclusions, Exercises, and References 65

5 Associative Arrays and Abstract Art 69

5.1 Visual Abstraction 69

5.2 Minimal Adjacency Array 71

5.3 Symmetric Adjacency Array 73

5.4 Weighted Adjacency Array 75

5.5 Incidence Array 75

5.6 Conclusions, Exercises, and References 78

6 Manipulating Graphs with Matrices 81

6.1 Introduction 81

6.2 Matrix Indices and Values 86

6.3 Composable Graph Operations and Linear Systems 89

6.4 Matrix Graph Operations Overview 96

6.5 Graph Algorithms and Diverse Semirings 105

6.6 Conclusions, Exercises, and References 108

7.1 Introduction 115

7.2 Data Representation 116

7.3 Graph Construction 118

7.4 Adjacency Array Graph Traversal 120

7.5 Incidence Array Graph Traversal 122

7.6 Vertex Degree Centrality 126

7.7 Edge Degree Centrality 129

7.8 Eigenvector Centrality 129

7.9 Singular Value Decomposition 133

7.10 PageRank 136

7.11 Deep Neural Networks 138

7.12 Conclusions, Exercises, and References 140

II Mathematical Foundations 145

8 Visualising the Algebra of Associative Arrays 147

8.1 Associative Array Analogs of Matrix Operations 147

8.2 Abstract Algebra for Computer Scientists and Engineers 150

8.3 Depicting Mathematics 152

8.4 Associative Array Class Diagrams 153

8.5 Set 154

8.6 Semiring 155

8.7 Linear Algebra 158

8.8 Ordered Sets 160

8.9 Boolean Algebra 162

8.10 Associative Array Algebra 164

8.11 Conclusions, Exercises, and References 164

9 Defining the Algebra of Associative Arrays 169

9.1 Operations on Sets 169

9.2 Ordered Sets 175

9.3 Supremum and Infimum 177

9.4 Lattice 181

9.5 The Semirings of Interest 186

9.6 Conclusions, Exercises, and References 189

10 Structural Properties of Associative Arrays 193

10.1 Estimating Structure 193

10.2 Associative Array Formal Definition 194

10.3 Padding Associative Arrays with Zeros 197

10.4 Zero, Null, Zero-Sum-Free 198

10.5 Properties of Matrices and Associative Arrays 199

10.6 Properties of Zero Padding 201

10.7 Support and Size 207

10.8 Image and Rank 208

10.9 Example: Music 209

10.10 Example: Art 211

10.11 Properties of Element-Wise Addition 213

10.12 Properties of Element-Wise Multiplication 217

10.13 Array Multiplication 221

10.14 Closure of Operations between Arrays 228

10.15 Conclusions, Exercises, and References 229

11 Graph Construction and Graphical Patterns 235

11.1 Introduction 235

11.2 Adjacency and Incidence Array Definitions 236

11.3 Adjacency Array Construction 242

11.4 Graph Construction with Different Semirings 250

11.5 Special Arrays and Graphs 255

11.6 Key Ordering 258

11.7 Algebraic Properties 263

11.8 Subobject Properties 264

11.9 Conclusions, Exercises, and References 266

III Linear Systems 269

12 Survey of Common Transformations 271

12.1 Array Transformations 271

12.2 Identity 274

12.3 Contraction 290

12.4 Stretching 293

12.5 Rotation 297

12.6 Conclusions, Exercises, and References 299

13 Maps and Bases 303

13.1 Semimodules 303

13.2 Linear Maps 307

13.3 Linear Independence and Bases 309

13.4 Existence of Bases 312

13.5 Size of Bases 313

13.6 Semialgebras and the Algebra of Arrays 317

13.7 Conclusions, Exercises, and References 320

14 Linearity of Associative Arrays 323

14.1 The Null Space of Linear Maps 323

14.2 Supremum-Blank Algebras 326

14.3 Max-Blank Structure Theorem 334

14.4 Examples of Supremum-Blank Algebras 338

14.5 Explicit Computations of x(A,w) for Supremum-Blank Algebras 342

14.6 Conclusions, Exercises, and References 348

15 Eigenvalues and Eigenvectors 351

15.1 Introduction 351

15.2 Quasi-Inverses 353

15.3 Existence of Eigenvalues for Idempotent Multiplication 359

15.4 Strong Dependence and Characteristic BipoJynomial 360

15.5 Eigenanalysis for Irreducible Matrices for Invertible Multiplication 367

15.6 Eigen-Semimodules 373

15.7 Singular Value Decomposition 378

15.8 Conclusions, Exercises, and References 385

16 Higher Dimensions 389

16.1 d-Dimensional Associative Arrays 389

16.2 Key Ordering and Two-Dimensional Projections 392

16.3 Algebraic Properties 398

16.4 Sub-Array Properties 400

16.5 Conclusions, Exercises, and References 402

Appendix: Notation 405

Index 413

What People are Saying About This

David A. Bader

In this era of big data, new methods for gaining insights promise to improve all aspects of our lives. This new textbook from Kepner and Jananthan is a fantastic resource for data scientists to understand the unifying mathematics for big data problems that covers everything from databases to graph analytics.

Jack Dongarra

Mathematics of Big Data presents a sophisticated view of matrices, graphs, databases, and spreadsheets, with many examples to help the discussion. The authors present the topic in three parts—applications and practice, mathematical foundations, and linear systems—with self-contained chapters to allow for easy reference and browsing. The algorithms are expressed in D4M, with execution possible in Matlab, Octave, and Julia. With exercises at the end of each section, the book can be used as a supplemental or primary text for a class on big data, algorithms, data structures, data analytics, linear algebra, or abstract algebra.

Endorsement

In this era of big data, new methods for gaining insights promise to improve all aspects of our lives. This new textbook from Kepner and Jananthan is a fantastic resource for data scientists to understand the unifying mathematics for big data problems that covers everything from databases to graph analytics.

David A. Bader, Professor and Chair, School of Computational Science and Engineering, Georgia Institute of Technology

From the Publisher

Mathematics of Big Data presents a sophisticated view of matrices, graphs, databases, and spreadsheets, with many examples to help the discussion. The authors present the topic in three parts—applications and practice, mathematical foundations, and linear systems—with self-contained chapters to allow for easy reference and browsing. The algorithms are expressed in D4M, with execution possible in Matlab, Octave, and Julia. With exercises at the end of each section, the book can be used as a supplemental or primary text for a class on big data, algorithms, data structures, data analytics, linear algebra, or abstract algebra.

Jack Dongarra, Professor, University of Tennessee, Oak Ridge National Laboratory, and University of Manchester; coauthor of MPI: The Complete Reference, second edition, volume 1

In this era of big data, new methods for gaining insights promise to improve all aspects of our lives. This new textbook from Kepner and Jananthan is a fantastic resource for data scientists to understand the unifying mathematics for big data problems that covers everything from databases to graph analytics.

David A. Bader, Professor and Chair, School of Computational Science and Engineering, Georgia Institute of Technology

From the B&N Reads Blog

Customer Reviews