Relevant Search: With applications for Solr and Elasticsearch
360Relevant Search: With applications for Solr and Elasticsearch
360Paperback(1st Edition)
-
SHIP THIS ITEMTemporarily Out of Stock Online
-
PICK UP IN STORECheck Availability at Nearby Stores
Available within 2 business hours
Related collections and offers
Overview
Relevant Search demystifies relevance work. Using Elasticsearch, it teaches you how to return engaging search results to your users, helping you understand and leverage the internals of Lucene-based search engines.
Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications.
About the Technology
Users are accustomed to and expect instant, relevant search results. To achieve this, you must master the search engine. Yet for many developers, relevance ranking is mysterious or confusing.
About the Book
Relevant Search demystifies the subject and shows you that a search engine is a programmable relevance framework. You'll learn how to apply Elasticsearch or Solr to your business's unique ranking problems. The book demonstrates how to program relevance and how to incorporate secondary data sources, taxonomies, text analytics, and personalization. In practice, a relevance framework requires softer skills as well, such as collaborating with stakeholders to discover the right relevance requirements for your business. By the end, you'll be able to achieve a virtuous cycle of provable, measurable relevance improvements over a search product's lifetime.
What's Inside
- Techniques for debugging relevance?
- Applying search engine features to real problems?
- Using the user interface to guide searchers?
- A systematic approach to relevance?
- A business culture focused on improving search
About the Reader
For developers trying to build smarter search with Elasticsearch or Solr.
About the Authors
Doug Turnbull is lead relevance consultant at OpenSource Connections, where he frequently speaks and blogs. John Berryman is a data engineer at Eventbrite, where he specializes in recommendations and search.
Foreword author, Trey Grainger, is a director of engineering at CareerBuilder and author of Solr in Action.
Table of Contents
- The search relevance problem
- Search under the hood
- Debugging your first relevance problem
- Taming tokens
- Basic multifield search
- Term-centric search
- Shaping the relevance function
- Providing relevance feedback
- Designing a relevance-focused search application
- The relevance-centered enterprise
- Semantic and personalized search
Product Details
ISBN-13: | 9781617292774 |
---|---|
Publisher: | Manning |
Publication date: | 07/09/2016 |
Edition description: | 1st Edition |
Pages: | 360 |
Product dimensions: | 7.30(w) x 9.00(h) x 1.00(d) |
About the Author
John Berryman is a data scientist at EventBrite where he specializes in recommendations and search. He is interested in the potential of integrating semantic understanding into search and discovery applications.
Table of Contents
Foreword xiii
Preface xv
Acknowledgments xvii
About this book xix
About the authors xxiii
About the cover illustration xxiv
1 The search relevance problem 1
1.1 Your goal: gaining the skills of a relevance engineer 2
1.2 Why is search relevance so hard? 3
What's a 'relevant" search result? 4
Search: there's no silver bullet! 6
1.3 Gaining insight from relevance research 6
Information retrieval 7
Can we use information retrieval to solve relevance 8
1.4 How do you solve relevance? 10
1.5 More than technology: curation, collaboration, and feedback 12
1.6 Summary 14
2 Search-under the hood 16
2.1 Search 101 17
What's a search document? 18
Searching the content 18
Exploring content through search 20
Getting content into the search engine 20
2.2 Search engine data structures 21
The inverted index 22
Other pieces of the inverted index 23
2.3 Indexing content: extraction, enrichment, analysis, and indexing 25
Extracting content into documents 26
Enriching documents to clean, augment, and merge data 27
Performing analysis 28
Indexing 31
2.4 Document search and retrieval 32
Boolean search: AND/OR/NOT 32
Boolean queries in Lucene-based search (MUST/MUSTJSIOT/SHOULD) 34
Positional and phrase matching 35
Enabling exploration: filtering, facets, and aggregations 36
Sorting, ranked results, and relevance 37
2.5 Summary 39
3 Debugging your first relevance problem 40
3.1 Applications to Solr and Elasticsearch: examples in Elasticsearch 41
3.2 Our most prominent data set: TMDB 42
3.3 Examples programmed in Python 43
3.4 Your first search application 43
Your first searches of the TMDB Elasticsearch index 46
3.5 Debugging query matching 48
Examining the underlying query strategy 49
Taking apart query parsing 50
Debugging analysis to solve matching issues 51
Comparing your query to the inverted index 53
Fixing our matching by changing analyzers 54
3.6 Debugging ranking 56
Decomposing the relevance score with Lucene's explain feature 57
The vector-space model, the relevance explain, and you 61
Practical caveats to the vector space model 64
Scanner matches to measure relevance 65
Computing weights with TF × IDF 67
Lies, damned lies, and similarity 68
Factoring in the search term's importance 70
Fixing Space Jam vs. alien ranking 70
3.7 Solved? Our work is never over! 72
3.8 Summary 73
4 Taming tokens 74
4.1 Tokens as document features 75
The matching process 76
Tokens, more than just words 76
4.2 Controlling precision and recall 77
Precision and recall by example 77
Analysis for precision or recall 80
Taking recall to extremes 84
4.3 Precision and recall-have your cake and eat it too 86
Scoring strength of a feature in a single field 86
Scoring beyond TF × IDF: multiple search terms and multiple fields 89
4.4 Analysis strategies 90
Dealing with delimiters 90
Capturing meaning with synonyms 93
Modeling specificity in search 96
Modeling specificity with synonyms 96
Modeling specificity with paths 99
Tokenize the world! 100
Tokenizing integers 101
Tokenizing geographic data 102
Tokenizing melodies 103
4.5 Summary 106
5 Basic multifield search 107
5.1 Signals and signal modeling 109
What is a signal? 109
Starting with the source data model 110
Implementing a signal 112
Signal modeling: data modeling for relevance 114
5.2 TMDB-search, the final frontier! 114
Violating the prime directive 116
Flattening nested docs 116
5.3 Signal modeling in field-centric search 118
Starting out with best_fields 122
Controlling field preference in search results 124
81 Better best_fields, with mare-precise, signals'? 126
Letting losers share the glory: calibrating best_fields 129
Counting multiple signals using most_fields 131
Boosting in most_fields 132
When additional matches don't matter 134
What's the verdict on most_fields? 135
5.4 Summary 135
6 Term-centric search 137
6.1 What is term-centric search? 138
6.2 Why do you need term-centric search? 140
Hunting for albino elephants 140
Finding an albino elephant in the Star Trek example 142
Avoiding signal discordance 144
Understanding the mechanics of signal discordance 145
6.3 Performing your first term-centric searches 147
Working with the term-centric ranking/unction 148
Running a term-centric query parser (into the ground) 151
Understanding field synchronicity 152
Field synchronicity and signal modeling 152
Query parsers and signal discordance 153
Tuning term-centric search 155
6.4 Solving signal discordance in term-centric search 157
Combining fields into custom all fields 157
Solving signal discordance with cross_fields 161
6.5 Combining field-centric and term-centric strategies: having your cake and eating it too 162
Grouping "like fields" together 163
Understanding the limits of like fields 164
Combining greedy naive search and conservative amplifiers 166
Term-centric vs. field-centric, and precision vs. recall 168
Considering filtering, boosting, and reranking 168
6.6 Summary 169
7 Shaping the relevance function 170
7.1 What do we mean by score shaping? 171
7.2 Boosting: shaping by promoting results 172
Boosting: the final frontier 173
When boosting-add or multiply
Boolean or function query? 174
You choose door A: additive boosting with Boolean queries 176
You choose door B: function queries using math for ranking 179
Hands-on with function queries: simple multiplicative boosting 180
Boosting basics: signals, signals everywhere 182
7.3 Filtering: shaping by excluding results 182
7.4 Score-shaping strategies for satisfying business needs 184
Search all the movies! 185
Modeling your boosting signals 186
Building the ranking function: adding high-value tiers 189
High-value tier scored with a function query 193
Ignoring TF × IDF 194
Capturing general-quality metrics 195
Achieving users' recency goals 197
Combining the Junction queries 200
Putting it all together! 202
7.5 Summary 203
8 Providing relevance feedback 204
8.1 Relevance feedback at the search box 206
Providing immediate results with search-as-you-type 206
Helping users find the best query with search completion 207
Correcting typos and misspellings with search suggestions 215
8.2 Relevance feedback while browsing 218
Building faceted browsing 219
Providing breadcrumb navigation 221
Selecting alternative results ordering 222
8.3 Relevance feedback in the search results listing 223
What information should be presented in listing items? 224
Relevance feedback through snippets and highlighting 225
Grouping similar documents 228
Helping the user when there are no results 230
8.4 Summary 231
9 Designing a relevance-focused search application 232
9.1 Yowl! The awesome new start-up! 233
9.2 Gathering information and requirements 234
Understand users and their information needs 234
Understand business needs 236
Identify required and available information 236
9.3 Designing the search application 238
Visualize the user's experience 239
Define fields and model signals 241
Combine and balance signals 242
9.4 Deploying, monitoring, and improving 252
Monitor 253
Identify problems and fix them! 254
9.5 Knowing when good is good enough 255
9.6 Summary 256
10 The relevance-centered enterprise 257
10.1 Feedback: the bedrock of the relevance-centered enterprise 259
10.2 Why user-focused culture before data-driven culture? 261
10.3 Flying relevance-blind 263
10.4 Relevance feedback awakenings: domain experts and expert users 265
10.5 Relevance feedback maturing: content curation 267
The role of the content curator 268
The risk of miscommunication with the content curator 269
10.6 Relevance streamlined: engineer/curator pairing 270
10.7 Relevance accelerated: test-driven relevance 272
Understanding test-driven relevance 272
Using test-driven relevance with user behavioral data 275
10.8 Beyond test-driven relevance: learning to rank 276
10.9 Summary 277
11 Semantic and personalized search 279
11.1 Personalizing search based on user profiles 281
Gathering user profile, information 282
Tying profile information back to the search index 282
11.2 Personalizing search based on user behavior 283
Introducing collaborative filtering 283
Basic collaborative filtering using co-occurrence counting 284
Tying user behavior information back to the search index 289
11.3 Basic methods for building concept search 293
Building concept signals 294
Augmenting content with synonyms 295
11.4 Building concept search using machine learning 296
The importance of phrases in concept search 297
11.5 The personalized search-concept search connection 298
11.6 Recommendation as a generalization of search 299
Replacing search with recommendation 300
11.7 Best wishes on your search relevance journey 301
11.8 Summary 302
Appendix A Indexing directly from TMDB 303
Appendix B Solr reader's companion 309
Index 323