Streaming Architecture: New Designs Using Apache Kafka and MapR Streams

Streaming Architecture: New Designs Using Apache Kafka and MapR Streams

Streaming Architecture: New Designs Using Apache Kafka and MapR Streams

Streaming Architecture: New Designs Using Apache Kafka and MapR Streams

Paperback

$24.99 
  • SHIP THIS ITEM
    Qualifies for Free Shipping
  • PICK UP IN STORE
    Check Availability at Nearby Stores

Related collections and offers


Overview

More and more data-driven companies are looking to adopt stream processing and streaming analytics. With this concise ebook, you’ll learn best practices for designing a reliable architecture that supports this emerging big-data paradigm.

Authors Ted Dunning and Ellen Friedman (Real World Hadoop) help you explore some of the best technologies to handle stream processing and analytics, with a focus on the upstream queuing or message-passing layer. To illustrate the effectiveness of these technologies, this book also includes specific use cases.

Ideal for developers and non-technical people alike, this book describes:

  • Key elements in good design for streaming analytics, focusing on the essential characteristics of the messaging layer
  • New messaging technologies, including Apache Kafka and MapR Streams, with links to sample code
  • Technology choices for streaming analytics: Apache Spark Streaming, Apache Flink, Apache Storm, and Apache Apex
  • How stream-based architectures are helpful to support microservices
  • Specific use cases such as fraud detection and geo-distributed data streams

Ted Dunning is Chief Applications Architect at MapR Technologies, and active in the open source community. He currently serves as VP for Incubator at the Apache Foundation, as a champion and mentor for a large number of projects, and as committer and PMC member of the Apache ZooKeeper and Drill projects. Ted is on Twitter as @ted_dunning.

Ellen Friedman, a committer for the Apache Drill and Apache Mahout projects, is a solutions consultant and well-known speaker and author, currently writing mainly about big data topics. With a PhD in Biochemistry, she has years of experience as a research scientist and has written about a variety of technical topics. Ellen is on Twitter as @Ellen_Friedman.


Product Details

ISBN-13: 9781491953921
Publisher: O'Reilly Media, Incorporated
Publication date: 05/26/2016
Pages: 117
Product dimensions: 5.90(w) x 8.80(h) x 0.40(d)

About the Author

Ted Dunning is Chief Applications Architect at MapR Technologies and active in the open source community.



He currently serves as VP for Incubator at the Apache Foundation,
as a champion and mentor for a large number of projects, and as committer and PMC member of the Apache ZooKeeper and Drill projects. He developed the t-digest algorithm used to estimate extreme quantiles. T-digest has been adopted by several open source projects. He also developed the open source log-synth project described in the book Sharing Big Data Safely (O’Reilly).



Ted was the chief architect behind the MusicMatch (now Yahoo
Music) and Veoh recommendation systems, built fraud-detection systems for ID Analytics (LifeLock), and has issued 24 patents to date. Ted has a PhD in computing science from Universityof Sheffield.
When he’s not doing data science, he plays guitar and mandolin.
Ted is on Twitter as @ted_dunning.

Ellen Friedman is a solutions consultant and well-known speaker and author, currently writing mainly about big data topics. She is a committer for the Apache Drill and Apache Mahout projects. With a
PhD in Biochemistry, she has years of experience as a research scientist and has written about a variety of technical topics, including molecular biology, nontraditional inheritance, and oceanography.
Ellen is also coauthor of a book of magic-themed cartoons, A Rabbit Under the Hat (The Edition House). Ellen is on Twitter as
@Ellen_Friedman.

Table of Contents

Preface v

1 Why Stream? 1

Planes, Trains, and Automobiles: Connected Vehicles and the IoT 2

Streaming Data: Life As It Happens 5

Beyond Real Time: More Benefits of Streaming Architecture 10

Emerging Best Practices for Streaming Architectures 11

Healthcare Example with Data Streams 13

Streaming Data as a Central Aspect of Architectural Design 15

2 Stream-based Architecture 17

A Limited View: Single Real-Time Application 17

Key Aspects of a Universal Stream-based Architecture 19

Importance of the Messaging Technology 22

Choices for Real-Time Analytics 25

Comparison of Capabilities for Streaming Analytics 29

Summary 31

3 Streaming Architecture: Ideal Platform for Microservices 33

Why Microservices Matter 34

What Is Needed to Support Microservices 37

Microservices in More Detail 38

Designing a Streaming Architecture: Online Video Service Example 41

Importance of a Universal Microarchitecture 45

What's in a Name? 46

Why Use Distributed Files and NoSQL Databases? 47

New Design for the Video Service 47

Summary: The Converged Platform View 49

4 Kafka as Streaming Transport 51

Motivations for Kafka 51

Kafka Innovations 52

Kafka Basic Concepts 53

The Kafka APIs 56

Kafka Utility Programs 63

Kafka Gotchas 64

Summary 68

5 MapR Streams 69

Innovations in MapR Streams 69

History and Context of MapR's Streaming System 71

How MapR Streams Works 73

How to Configure MapR Streams 75

Geo-Distributed Replication 77

MapR Streams Gotchas 79

6 Fraud Detection with Streaming Data 81

Card Velocity 81

Fast Response Decision to the Question: "Is It Fraud?" 83

Multiuse Streaming Data 85

Scaling Up the Fraud Detector 86

Summary 88

7 Geo-Distributed Data Streams 89

Stakeholders 90

Design Goals 91

Design Choices 92

Advantages of Streams-based Geo-Replication 96

8 Putting St All Together 97

Benefits of Stream-based Architectures 98

Making the Transition to Streaming Architecture 99

Conclusion 103

A Additional Resources 105

From the B&N Reads Blog

Customer Reviews