Relevant Search: With applications for Solr and Elasticsearch

Relevant Search: With applications for Solr and Elasticsearch

Relevant Search: With applications for Solr and Elasticsearch

Relevant Search: With applications for Solr and Elasticsearch

Paperback(1st Edition)

$44.99 
  • SHIP THIS ITEM
    Temporarily Out of Stock Online
  • PICK UP IN STORE
    Check Availability at Nearby Stores

Related collections and offers


Overview

Summary

Relevant Search demystifies relevance work. Using Elasticsearch, it teaches you how to return engaging search results to your users, helping you understand and leverage the internals of Lucene-based search engines.

Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications.

About the Technology

Users are accustomed to and expect instant, relevant search results. To achieve this, you must master the search engine. Yet for many developers, relevance ranking is mysterious or confusing.

About the Book

Relevant Search demystifies the subject and shows you that a search engine is a programmable relevance framework. You'll learn how to apply Elasticsearch or Solr to your business's unique ranking problems. The book demonstrates how to program relevance and how to incorporate secondary data sources, taxonomies, text analytics, and personalization. In practice, a relevance framework requires softer skills as well, such as collaborating with stakeholders to discover the right relevance requirements for your business. By the end, you'll be able to achieve a virtuous cycle of provable, measurable relevance improvements over a search product's lifetime.

What's Inside

  • Techniques for debugging relevance?
  • Applying search engine features to real problems?
  • Using the user interface to guide searchers?
  • A systematic approach to relevance?
  • A business culture focused on improving search

About the Reader

For developers trying to build smarter search with Elasticsearch or Solr.

About the Authors

Doug Turnbull is lead relevance consultant at OpenSource Connections, where he frequently speaks and blogs. John Berryman is a data engineer at Eventbrite, where he specializes in recommendations and search.

Foreword author, Trey Grainger, is a director of engineering at CareerBuilder and author of Solr in Action.

Table of Contents

  1. The search relevance problem
  2. Search under the hood
  3. Debugging your first relevance problem
  4. Taming tokens
  5. Basic multifield search
  6. Term-centric search
  7. Shaping the relevance function
  8. Providing relevance feedback
  9. Designing a relevance-focused search application
  10. The relevance-centered enterprise
  11. Semantic and personalized search

Product Details

ISBN-13: 9781617292774
Publisher: Manning
Publication date: 07/09/2016
Edition description: 1st Edition
Pages: 360
Product dimensions: 7.30(w) x 9.00(h) x 1.00(d)

About the Author

Doug Turnbull is Staff Relevance Engineer at Spotify and is the former Chief Technical Officer at OpenSource Connections. He is the co-author of the book Relevant Search, and contributed chapters 10-12 on “Learning to Rank”, “Automated Learning to Rank with Click Models”, and “Overcoming Bias in Learned Relevance Models”.

John Berryman is a data scientist at EventBrite where he specializes in recommendations and search. He is interested in the potential of integrating semantic understanding into search and discovery applications.

Table of Contents

Foreword xiii

Preface xv

Acknowledgments xvii

About this book xix

About the authors xxiii

About the cover illustration xxiv

1 The search relevance problem 1

1.1 Your goal: gaining the skills of a relevance engineer 2

1.2 Why is search relevance so hard? 3

What's a 'relevant" search result? 4

Search: there's no silver bullet! 6

1.3 Gaining insight from relevance research 6

Information retrieval 7

Can we use information retrieval to solve relevance 8

1.4 How do you solve relevance? 10

1.5 More than technology: curation, collaboration, and feedback 12

1.6 Summary 14

2 Search-under the hood 16

2.1 Search 101 17

What's a search document? 18

Searching the content 18

Exploring content through search 20

Getting content into the search engine 20

2.2 Search engine data structures 21

The inverted index 22

Other pieces of the inverted index 23

2.3 Indexing content: extraction, enrichment, analysis, and indexing 25

Extracting content into documents 26

Enriching documents to clean, augment, and merge data 27

Performing analysis 28

Indexing 31

2.4 Document search and retrieval 32

Boolean search: AND/OR/NOT 32

Boolean queries in Lucene-based search (MUST/MUSTJSIOT/SHOULD) 34

Positional and phrase matching 35

Enabling exploration: filtering, facets, and aggregations 36

Sorting, ranked results, and relevance 37

2.5 Summary 39

3 Debugging your first relevance problem 40

3.1 Applications to Solr and Elasticsearch: examples in Elasticsearch 41

3.2 Our most prominent data set: TMDB 42

3.3 Examples programmed in Python 43

3.4 Your first search application 43

Your first searches of the TMDB Elasticsearch index 46

3.5 Debugging query matching 48

Examining the underlying query strategy 49

Taking apart query parsing 50

Debugging analysis to solve matching issues 51

Comparing your query to the inverted index 53

Fixing our matching by changing analyzers 54

3.6 Debugging ranking 56

Decomposing the relevance score with Lucene's explain feature 57

The vector-space model, the relevance explain, and you 61

Practical caveats to the vector space model 64

Scanner matches to measure relevance 65

Computing weights with TF × IDF 67

Lies, damned lies, and similarity 68

Factoring in the search term's importance 70

Fixing Space Jam vs. alien ranking 70

3.7 Solved? Our work is never over! 72

3.8 Summary 73

4 Taming tokens 74

4.1 Tokens as document features 75

The matching process 76

Tokens, more than just words 76

4.2 Controlling precision and recall 77

Precision and recall by example 77

Analysis for precision or recall 80

Taking recall to extremes 84

4.3 Precision and recall-have your cake and eat it too 86

Scoring strength of a feature in a single field 86

Scoring beyond TF × IDF: multiple search terms and multiple fields 89

4.4 Analysis strategies 90

Dealing with delimiters 90

Capturing meaning with synonyms 93

Modeling specificity in search 96

Modeling specificity with synonyms 96

Modeling specificity with paths 99

Tokenize the world! 100

Tokenizing integers 101

Tokenizing geographic data 102

Tokenizing melodies 103

4.5 Summary 106

5 Basic multifield search 107

5.1 Signals and signal modeling 109

What is a signal? 109

Starting with the source data model 110

Implementing a signal 112

Signal modeling: data modeling for relevance 114

5.2 TMDB-search, the final frontier! 114

Violating the prime directive 116

Flattening nested docs 116

5.3 Signal modeling in field-centric search 118

Starting out with best_fields 122

Controlling field preference in search results 124

81 Better best_fields, with mare-precise, signals'? 126

Letting losers share the glory: calibrating best_fields 129

Counting multiple signals using most_fields 131

Boosting in most_fields 132

When additional matches don't matter 134

What's the verdict on most_fields? 135

5.4 Summary 135

6 Term-centric search 137

6.1 What is term-centric search? 138

6.2 Why do you need term-centric search? 140

Hunting for albino elephants 140

Finding an albino elephant in the Star Trek example 142

Avoiding signal discordance 144

Understanding the mechanics of signal discordance 145

6.3 Performing your first term-centric searches 147

Working with the term-centric ranking/unction 148

Running a term-centric query parser (into the ground) 151

Understanding field synchronicity 152

Field synchronicity and signal modeling 152

Query parsers and signal discordance 153

Tuning term-centric search 155

6.4 Solving signal discordance in term-centric search 157

Combining fields into custom all fields 157

Solving signal discordance with cross_fields 161

6.5 Combining field-centric and term-centric strategies: having your cake and eating it too 162

Grouping "like fields" together 163

Understanding the limits of like fields 164

Combining greedy naive search and conservative amplifiers 166

Term-centric vs. field-centric, and precision vs. recall 168

Considering filtering, boosting, and reranking 168

6.6 Summary 169

7 Shaping the relevance function 170

7.1 What do we mean by score shaping? 171

7.2 Boosting: shaping by promoting results 172

Boosting: the final frontier 173

When boosting-add or multiply

Boolean or function query? 174

You choose door A: additive boosting with Boolean queries 176

You choose door B: function queries using math for ranking 179

Hands-on with function queries: simple multiplicative boosting 180

Boosting basics: signals, signals everywhere 182

7.3 Filtering: shaping by excluding results 182

7.4 Score-shaping strategies for satisfying business needs 184

Search all the movies! 185

Modeling your boosting signals 186

Building the ranking function: adding high-value tiers 189

High-value tier scored with a function query 193

Ignoring TF × IDF 194

Capturing general-quality metrics 195

Achieving users' recency goals 197

Combining the Junction queries 200

Putting it all together! 202

7.5 Summary 203

8 Providing relevance feedback 204

8.1 Relevance feedback at the search box 206

Providing immediate results with search-as-you-type 206

Helping users find the best query with search completion 207

Correcting typos and misspellings with search suggestions 215

8.2 Relevance feedback while browsing 218

Building faceted browsing 219

Providing breadcrumb navigation 221

Selecting alternative results ordering 222

8.3 Relevance feedback in the search results listing 223

What information should be presented in listing items? 224

Relevance feedback through snippets and highlighting 225

Grouping similar documents 228

Helping the user when there are no results 230

8.4 Summary 231

9 Designing a relevance-focused search application 232

9.1 Yowl! The awesome new start-up! 233

9.2 Gathering information and requirements 234

Understand users and their information needs 234

Understand business needs 236

Identify required and available information 236

9.3 Designing the search application 238

Visualize the user's experience 239

Define fields and model signals 241

Combine and balance signals 242

9.4 Deploying, monitoring, and improving 252

Monitor 253

Identify problems and fix them! 254

9.5 Knowing when good is good enough 255

9.6 Summary 256

10 The relevance-centered enterprise 257

10.1 Feedback: the bedrock of the relevance-centered enterprise 259

10.2 Why user-focused culture before data-driven culture? 261

10.3 Flying relevance-blind 263

10.4 Relevance feedback awakenings: domain experts and expert users 265

10.5 Relevance feedback maturing: content curation 267

The role of the content curator 268

The risk of miscommunication with the content curator 269

10.6 Relevance streamlined: engineer/curator pairing 270

10.7 Relevance accelerated: test-driven relevance 272

Understanding test-driven relevance 272

Using test-driven relevance with user behavioral data 275

10.8 Beyond test-driven relevance: learning to rank 276

10.9 Summary 277

11 Semantic and personalized search 279

11.1 Personalizing search based on user profiles 281

Gathering user profile, information 282

Tying profile information back to the search index 282

11.2 Personalizing search based on user behavior 283

Introducing collaborative filtering 283

Basic collaborative filtering using co-occurrence counting 284

Tying user behavior information back to the search index 289

11.3 Basic methods for building concept search 293

Building concept signals 294

Augmenting content with synonyms 295

11.4 Building concept search using machine learning 296

The importance of phrases in concept search 297

11.5 The personalized search-concept search connection 298

11.6 Recommendation as a generalization of search 299

Replacing search with recommendation 300

11.7 Best wishes on your search relevance journey 301

11.8 Summary 302

Appendix A Indexing directly from TMDB 303

Appendix B Solr reader's companion 309

Index 323

From the B&N Reads Blog

Customer Reviews