Read an Excerpt
Chatper 1: Introduction
Without even realizing it, everyone is affected by poor data quality. Some are affected directly in annoying ways, such as receiving two or three identical mailings from the same sales organization in the same week. Some are affected in less direct ways, such as the 20-minute wait on hold for a customer service department. Some are affected more malevolently through deliberate fraud, such as identity theft. But whenever poor data quality, inconsistencies, and errors bloat both companies and government agencies and hamper their ability to provide the best possible service, everyone suffers.
Data quality seems to be a hazy concept, but the lack of data quality severely hampers the ability of organizations to effectively accumulate and manage enterprise-wide knowledge. The goal of this book is to demonstrate that data quality is not an esoteric notion but something that can be quantified, measured, and improved, all with a strict focus on return on investment. Our approach is that knowledge management is a pillar that must stand securely on a pedestal of data quality, and by the end of this book, the reader should be able to build that pedestal.
This book covers these areas.
- Data ownership paradigms
- The definition of data quality
- An economic framework for data quality, including steps in building a return on investment model to justify the costs of a data quality program
- The dimensions of data quality
- Using statistical process control as a tool for measurement
- Data domains and mappings between those domains
- Data quality rules and business rules
- Measurement and current state assessment
- Data quality requirementsanalysis
- Metadata and policy
- Rules-based processing
- Discovery of metadata and data quality and business rules
- Data cleansing
- Root cause analysis and supplier management
- Data enhancement
- Putting it all into practice
The end of the book summarizes the processes discussed and the steps to building a data quality practice. Before we dive into the technical components, however, it is worthwhile to spend some time looking at some real-world examples for motivation. In the next section, you will see some examples of "data quality horror stories" - tales of adverse effects of poor data quality.
1.1.1 Bank Deposit?
In November of 1998, it was reported by the Associated Press that a New York man allegedly brought a dead deer into a bank in Stamford, Connecticut, because he was upset with the bank's service. Police say the 70-year-old argued with a teller over a clerical mistake with his checking account. Because he was apparently unhappy with the teller, he went home, got the deer carcass and brought it back to the branch office.
1.1.2 CD Mail Fraud
Here is a news story taken from the Associated Press newswire. The text is printed with permission. Newark - For four years a Middlesex County man fooled the computer fraud programs at two music-by-mail clubs, using 1,630 aliases to buy music CDs at rates offered only to first-time buyers.
David Russo, 33, of Sayerville, NJ, admitted yesterday that he received 22,260 CDs by making each address - even if f it listed the same post office box - different enough to evade fraud-detection computer programs.
Among his methods: adding fictitious apartment numbers, unneeded direction abbreviations and extra punctuation marks. (Emphasis mine) The scam is believed to be the largest of its kind in the nation, said Assistant U.S. Attorney Scott S. Christie, who prosecuted the case. The introductory offers typically provided nine free CDs with the purchase of one CD at the regular price, plus shipping and handling. Other CDs then had to be purchased later to fulfill club requirements. Russo paid about $56,000 for CDs, said Paul B. Brickfield, his lawyer, or an average of $2.50 each. He then sold the CDs at flea markets for about $10 each, Brickfield said. Russo pleaded guilty to a single count of mail fraud. He faces about 12 to 18 months in prison and a fine of up to $250,000.
1.1.3 Mars Orbiter
The Mars Climate Orbiter, a key part of NASA's program to explore the planet Mars, vanished in September 1999 after rockets were fired to bring it into orbit of the planet. It was later discovered by an investigative board that NASA engineers failed to convert English measures of rocket thrusts to newtons, a metric system measuring rocket force, and that was the root cause of the loss of the spacecraft. The orbiter smashed into the planet instead of reaching a safe orbit. This discrepancy between the two measures, which was relatively small, caused the orbiter to approach Mars at too low an altitude. The result was the loss of a $125 million spacecraft and a significant setback in NASA's ability to explore Mars...