Mathematics of Big Data: Spreadsheets, Databases, Matrices, and Graphs available in Hardcover
![Mathematics of Big Data: Spreadsheets, Databases, Matrices, and Graphs](http://img.images-bn.com/static/redesign/srcs/images/grey-box.png?v11.10.4)
Mathematics of Big Data: Spreadsheets, Databases, Matrices, and Graphs
- ISBN-10:
- 0262038390
- ISBN-13:
- 9780262038393
- Pub. Date:
- 07/17/2018
- Publisher:
- MIT Press
- ISBN-10:
- 0262038390
- ISBN-13:
- 9780262038393
- Pub. Date:
- 07/17/2018
- Publisher:
- MIT Press
![Mathematics of Big Data: Spreadsheets, Databases, Matrices, and Graphs](http://img.images-bn.com/static/redesign/srcs/images/grey-box.png?v11.10.4)
Mathematics of Big Data: Spreadsheets, Databases, Matrices, and Graphs
Hardcover
Buy New
$80.00Overview
Today, the volume, velocity, and variety of data are increasing rapidly across a range of fields, including Internet search, healthcare, finance, social media, wireless devices, and cybersecurity. Indeed, these data are growing at a rate beyond our capacity to analyze them. The tools—including spreadsheets, databases, matrices, and graphs—developed to address this challenge all reflect the need to store and operate on data as whole sets rather than as individual elements. This book presents the common mathematical foundations of these data sets that apply across many applications and technologies. Associative arrays unify and simplify data, allowing readers to look past the differences among the various tools and leverage their mathematical similarities in order to solve the hardest big data challenges.
The book first introduces the concept of the associative array in practical terms, presents the associative array manipulation system D4M (Dynamic Distributed Dimensional Data Model), and describes the application of associative arrays to graph analysis and machine learning. It provides a mathematically rigorous definition of associative arrays and describes the properties of associative arrays that arise from this definition. Finally, the book shows how concepts of linearity can be extended to encompass associative arrays. Mathematics of Big Data can be used as a textbook or reference by engineers, scientists, mathematicians, computer scientists, and software engineers who analyze big data.
Product Details
ISBN-13: | 9780262038393 |
---|---|
Publisher: | MIT Press |
Publication date: | 07/17/2018 |
Series: | MIT Lincoln Laboratory Series |
Pages: | 448 |
Product dimensions: | 7.00(w) x 9.10(h) x 1.30(d) |
Age Range: | 18 Years |
About the Author
Hayden Jananthan is a PhD candidate in the Department of Mathematics at Vanderbilt University.
Charles E. Leiserson is Professor of Computer Science and Engineering at the Massachusetts Institute of Technology.
Table of Contents
Foreword xi
Preface xiii
About the Authors xvii
About the Cover xix
Acknowledgments xxiii
I Applications and Practice 1
1 Introduction and Overview 3
1.1 Mathematics of Data 3
1.2 Data in the World 5
1.3 Mathematical Foundations 9
1.4 Making Data Rigorous 14
1.5 Conclusions, Exercises, and References 16
2 Perspectives on Data 19
2.1 Interrelations 19
2.2 Spreadsheets 20
2.3 Databases 22
2.4 Matrices 26
2.5 Graphs 27
2.6 Map Reduce 29
2.7 Other Perspectives 30
2.8 Conclusions, Exercises, and References 31
3 Dynamic Distributed Dimensional data Model 37
3.1 Background 37
3.2 Design 38
3.3 Matrix Mathematics 39
3.4 Common SQL, NoSQL, NewSQL Interface 40
3.5 Key-Value Store Database Schema 41
3.6 Data-Independent Analytics 44
3.7 Parallel Performance 49
3.8 Computing on Masked Data 51
3.9 Conclusions, Exercises, and References 53
4 Associative Arrays and Musical Metadata 57
4.1 Data and Metadata 57
4.2 Dense Data 58
4.3 Dense Operations 60
4.4 Sparse Data 62
4.5 Sparse Operations 63
4.6 Conclusions, Exercises, and References 65
5 Associative Arrays and Abstract Art 69
5.1 Visual Abstraction 69
5.2 Minimal Adjacency Array 71
5.3 Symmetric Adjacency Array 73
5.4 Weighted Adjacency Array 75
5.5 Incidence Array 75
5.6 Conclusions, Exercises, and References 78
6 Manipulating Graphs with Matrices 81
6.1 Introduction 81
6.2 Matrix Indices and Values 86
6.3 Composable Graph Operations and Linear Systems 89
6.4 Matrix Graph Operations Overview 96
6.5 Graph Algorithms and Diverse Semirings 105
6.6 Conclusions, Exercises, and References 108
7.1 Introduction 115
7.2 Data Representation 116
7.3 Graph Construction 118
7.4 Adjacency Array Graph Traversal 120
7.5 Incidence Array Graph Traversal 122
7.6 Vertex Degree Centrality 126
7.7 Edge Degree Centrality 129
7.8 Eigenvector Centrality 129
7.9 Singular Value Decomposition 133
7.10 PageRank 136
7.11 Deep Neural Networks 138
7.12 Conclusions, Exercises, and References 140
II Mathematical Foundations 145
8 Visualising the Algebra of Associative Arrays 147
8.1 Associative Array Analogs of Matrix Operations 147
8.2 Abstract Algebra for Computer Scientists and Engineers 150
8.3 Depicting Mathematics 152
8.4 Associative Array Class Diagrams 153
8.5 Set 154
8.6 Semiring 155
8.7 Linear Algebra 158
8.8 Ordered Sets 160
8.9 Boolean Algebra 162
8.10 Associative Array Algebra 164
8.11 Conclusions, Exercises, and References 164
9 Defining the Algebra of Associative Arrays 169
9.1 Operations on Sets 169
9.2 Ordered Sets 175
9.3 Supremum and Infimum 177
9.4 Lattice 181
9.5 The Semirings of Interest 186
9.6 Conclusions, Exercises, and References 189
10 Structural Properties of Associative Arrays 193
10.1 Estimating Structure 193
10.2 Associative Array Formal Definition 194
10.3 Padding Associative Arrays with Zeros 197
10.4 Zero, Null, Zero-Sum-Free 198
10.5 Properties of Matrices and Associative Arrays 199
10.6 Properties of Zero Padding 201
10.7 Support and Size 207
10.8 Image and Rank 208
10.9 Example: Music 209
10.10 Example: Art 211
10.11 Properties of Element-Wise Addition 213
10.12 Properties of Element-Wise Multiplication 217
10.13 Array Multiplication 221
10.14 Closure of Operations between Arrays 228
10.15 Conclusions, Exercises, and References 229
11 Graph Construction and Graphical Patterns 235
11.1 Introduction 235
11.2 Adjacency and Incidence Array Definitions 236
11.3 Adjacency Array Construction 242
11.4 Graph Construction with Different Semirings 250
11.5 Special Arrays and Graphs 255
11.6 Key Ordering 258
11.7 Algebraic Properties 263
11.8 Subobject Properties 264
11.9 Conclusions, Exercises, and References 266
III Linear Systems 269
12 Survey of Common Transformations 271
12.1 Array Transformations 271
12.2 Identity 274
12.3 Contraction 290
12.4 Stretching 293
12.5 Rotation 297
12.6 Conclusions, Exercises, and References 299
13 Maps and Bases 303
13.1 Semimodules 303
13.2 Linear Maps 307
13.3 Linear Independence and Bases 309
13.4 Existence of Bases 312
13.5 Size of Bases 313
13.6 Semialgebras and the Algebra of Arrays 317
13.7 Conclusions, Exercises, and References 320
14 Linearity of Associative Arrays 323
14.1 The Null Space of Linear Maps 323
14.2 Supremum-Blank Algebras 326
14.3 Max-Blank Structure Theorem 334
14.4 Examples of Supremum-Blank Algebras 338
14.5 Explicit Computations of x(A,w) for Supremum-Blank Algebras 342
14.6 Conclusions, Exercises, and References 348
15 Eigenvalues and Eigenvectors 351
15.1 Introduction 351
15.2 Quasi-Inverses 353
15.3 Existence of Eigenvalues for Idempotent Multiplication 359
15.4 Strong Dependence and Characteristic BipoJynomial 360
15.5 Eigenanalysis for Irreducible Matrices for Invertible Multiplication 367
15.6 Eigen-Semimodules 373
15.7 Singular Value Decomposition 378
15.8 Conclusions, Exercises, and References 385
16 Higher Dimensions 389
16.1 d-Dimensional Associative Arrays 389
16.2 Key Ordering and Two-Dimensional Projections 392
16.3 Algebraic Properties 398
16.4 Sub-Array Properties 400
16.5 Conclusions, Exercises, and References 402
Appendix: Notation 405
Index 413
What People are Saying About This
In this era of big data, new methods for gaining insights promise to improve all aspects of our lives. This new textbook from Kepner and Jananthan is a fantastic resource for data scientists to understand the unifying mathematics for big data problems that covers everything from databases to graph analytics.
Mathematics of Big Data presents a sophisticated view of matrices, graphs, databases, and spreadsheets, with many examples to help the discussion. The authors present the topic in three partsapplications and practice, mathematical foundations, and linear systemswith self-contained chapters to allow for easy reference and browsing. The algorithms are expressed in D4M, with execution possible in Matlab, Octave, and Julia. With exercises at the end of each section, the book can be used as a supplemental or primary text for a class on big data, algorithms, data structures, data analytics, linear algebra, or abstract algebra.
In this era of big data, new methods for gaining insights promise to improve all aspects of our lives. This new textbook from Kepner and Jananthan is a fantastic resource for data scientists to understand the unifying mathematics for big data problems that covers everything from databases to graph analytics.
David A. Bader, Professor and Chair, School of Computational Science and Engineering, Georgia Institute of Technology
Mathematics of Big Data presents a sophisticated view of matrices, graphs, databases, and spreadsheets, with many examples to help the discussion. The authors present the topic in three partsapplications and practice, mathematical foundations, and linear systemswith self-contained chapters to allow for easy reference and browsing. The algorithms are expressed in D4M, with execution possible in Matlab, Octave, and Julia. With exercises at the end of each section, the book can be used as a supplemental or primary text for a class on big data, algorithms, data structures, data analytics, linear algebra, or abstract algebra.
Jack Dongarra, Professor, University of Tennessee, Oak Ridge National Laboratory, and University of Manchester; coauthor of MPI: The Complete Reference, second edition, volume 1In this era of big data, new methods for gaining insights promise to improve all aspects of our lives. This new textbook from Kepner and Jananthan is a fantastic resource for data scientists to understand the unifying mathematics for big data problems that covers everything from databases to graph analytics.
David A. Bader, Professor and Chair, School of Computational Science and Engineering, Georgia Institute of Technology