Text Mining: A Divining Rod For Insurance Fraud?

We’ve previously talked about how technology can be used in fraud investigations; basically, it can help investigators pay attention to the important things. The trick is learning, and keeping up with, what is important. A good tool for doing this is text mining. By taking advantage of a computer’s ability to “read” many more pages of text than a human ever could hope to, in a fraction of the time, and to draw patterns and other relevant information from all of that “reading,” text mining can point an investigator in the right direction by letting her know what warning signs to look for and address. Turning words in free-flowing narratives, emails, statements, and notes into useful facts that can be studied and analyzed, text mining can, and should, help any investigator.

There are four different parts of the text mining process, each of which is as essential as the other:

1. Information Retrieval Systems: These weed out irrelevant documents in order to concentrate on the ones that are more likely to get you the answers you are looking for. Say you’re looking for the best attorney to help you buy a house; chances are you’d be smart to look for ones who have “real estate,” “houses,” or “homes” somewhere in the description of their practice. This won’t necessarily get you the attorney you want, but it will make your search a whole lot easier. This is what information retrieval systems do.

2. Natural Language Processing: This allows a computer to “read” the text, by breaking it down by sentence structure and parts of speech, and by figuring out what a word means, even when it has more than one meaning. Think of how hard it would be to understand a sentence if you couldn’t put it into context. “Break a leg,” would mean one thing if said it to a performer, another if said to a doctor or even a lawyer.

3. Information Extraction: This allows a computer to identify, and pull out, multi-word terms, names, and facts from text in a written document. Instead of a free flowing narrative, you now get concrete facts to analyze and in which to look for patterns and meanings.

4. Data Mining: This is really the way to discover what you’re looking for: the heretofore hidden meanings or unknown correlations between facts. A good example is when a drug developed to battle one condition, such as minoxidil for lowering blood pressure, is found to cure another, such as male pattern baldness. You have to be able to search through all of the scientific studies in order to know that this is even possible, before you even think to confirm it with your own investigation.

A recent study, led by Philip Borba of Milliman Inc., shows just how useful text mining can be to the insurance industry. Using text mining, the study was better able to predict how severe an auto accident will be, and whether anyone will get hurt in it, based on whether the drivers use a medication or drug. The best part, though, was that the study used only free form narratives, really general descriptions of how the accident occurred, in order to do it.

The researchers started with narrative reports of approximately 7,000 auto accidents, where each contained between 400-600 words, and looked for indications of drug use in each. It would be difficult, if not impossible, even for a dedicated team of professionals, to sort through the minimum 2.8 million words to look for any indications of drug use, let alone different types, and draw any meaningful conclusions from them. It would be equally useless if a computer could do nothing more than recognize, and tally, each particular word, without recognizing that particular groups of words have particular meanings and that certain of these indicate drug use. What Borba did, however, was different. He broke down the narratives into 1-6 word phrases; over 3 million of them. He then had computers “read” the reports to identify particular combinations of phrases which indicate that someone was on medication or using a drug. What he found was enlightening.

Under the same conditions, good weather, daytime driving, and no alcohol use, the likelihood that someone would be injured jumped from 57%, to:

– 72% when one of the drivers either was on a prescription or a specific drug was mentioned;

– 75% when the narrative mentioned medication; and

– 85% when a specific illegal drug was mentioned.

This information is useful in and of itself. It could let carriers more accurately set their rates and possibly make it easier to recover in subrogation when the other driver was on medication at the time of the accident. What makes this study really important, however, are the possibilities it points out for fraud investigations.

If a carrier could scour the narratives, emails, and reports on its tens of thousands of claims every year, it could use that information to determine what set of facts warrant closer scrutiny for each particular type of claim. It could determine whether an insured is more likely to set an incendiary fire if he doesn’t live in the house for a month, has to rent out part of his house for the first time, or takes a certain percentage of his tenants to landlord-tenant court within a given time. It could let the carrier know the warning signs of an insured who doesn’t live in his own home or of an insured who doesn’t even own the piece of scheduled jewelry she has claimed was stolen. It could even simply look for common denominators, such as names of doctors or medical professionals in suspicious no-fault claims; when one is found to be fraudulent, all those who had contact with him could then be looked at more closely. These criteria could easily be kept up to date by consistently examining claims as they get closed out. Either way, as a heads-up or a look-back, they should prove invaluable in the hands of a skilled investigator; and that is something you should pay attention to.