Celent Says

A New Data Mantra: Capture Everything

Craig Beattie
Insurance Experts' Forum, April 1, 2013

In conversations with insurers globally, we at Celent are hearing of a new approach to analytics. It's not called big data, but a different approach, one that seeks to leverage data far more quickly and be more tolerant of the errors in the data. A shift toward the idea that all data is useful is occurring, but baby steps so far—I still often hear about truth, fact and consistent data in discussions.

When thinking about data, this idea of truth has always bothered me—the idea that system data represents the facts, or the unassailable truth. One of the key activities in establishing classic analytics processes is establishing which data is the truth, there are always arguments about which data is accurate and can be trusted. In this process, inaccurate data (or data that doesn’t contribute to this truth) is ignored, removed or lost. This leads to a negotiation process, the end result of which is often called the single version of the truth—i.e. the output of a report that all stakeholders agree to. The strange thing about this process is that it observes that there are multiple viewpoints, but seeks a single truth regardless. Relational database design and modern user interfaces push us toward this line of thinking, where there is only one field to fill in, one answer to each question after all.

I suggest that there is value in capturing the half-truths, the out-right lies; and technology now let’s us analyze these semantically.

It’s easy to come up with examples from the insurance industry where we regularly accept that data is likely flawed. For instance, original quote data says the vehicle is a standard build but the claims adjuster spots the alloy wheels and rear parking sensors. In the case of an accident, for many motor claims, the insured makes a statement that there was a crash and the other driver was in error. The other driver also makes a similar statement, saying that there was a crash and the insured was at fault. Most modern systems capture all of this data, the different views over time, the different views from different stakeholders, but most systems and processes still assume that at a given point there is one set of valid data, one driver at fault.

Now that customers are posting to social media, insurers face more questions: What if what an insured stated at the time of purchase is contradicted in their Facebook profile? Was that tweet accurate or just posturing on the part of the customer? How should the insurer, or rather the automated systems analyzing this data, treat these contrary positions?

There is factual data that is true—the fact that the witness statements were made, the date and time when they were captured, who made them, regarding what case. What of the pertinent data though, the data the humans actually use in determining the case or what should be done next, the data that allows us to reason about the case and to make a judgement? This information is typically stored in free text formats, requiring humans to interpret the data and do what humans do well: establish hypotheses and test them, ultimately selecting the one they feel fits best and recording that result as fact. For example, it’s a fact that Bob, the claims handler, felt on Jan. 1, 2012 that the insured wasn’t at fault, but is that what is recorded? Or is it the assertion that the insured was at fault, recorded as the truth and not a hypothesis with an audit trail to who updated the system?

If the big data movement has taught us anything, along with the exploits of Google, Amazon, etc., it is that all data is useful. Capture everything.

Why you ask? One example: There exist algorithms and systems that enable the analysis of competing hypotheses, capturing how credible or likely an assertion is based on the believability of the source of the underlying data. What if your system could highlight how plausible the insured’s data or statement is, or a witnesses testimony, or the data from a third party based on the information at-hand? What if your core system presented options rather than an answer derived assuming everything in the system is correct?

Truth, then, is not something best derived from raw data after the fact, but rather, something that requires consideration as the data is being collected. Data, knowledge and information collected in the right way will allow future systems to help insurer staff reason about the data and be more effective. The insurance industry is, however, sitting on a gold mine of raw data, and as it starts to mine that data to leverage it for new insights, I suggest insurers seek new models to better understand the knowledge therein.

Those insurers that will emerge as leaders will capture all the data that they can, will understand that some of that data is contradictory and model it in such a way that software can support decisions about the data rather than leave the grey areas to the human operators.

What’s your view: Is there a single version of the truth in insurance? How are you dealing with contrary data? Have you already solved this?

This blog has been reprinted with permission from Celent.

Craig Beattie is an analyst in Celent's insurance group, and can be reached at cbeattie@celent.com.

Readers are encouraged to respond to Craig using the “Add Your Comments” box below.

The opinions posted in this blog do not necessarily reflect those of Insurance Networking News or SourceMedia.

Comments (2)

Technologies allowing for analysis of natural language or free form text are growing increasingly mature as they are leveraged in Big Data. Gone are the days of simple keyword analysis and ontologies to decipher if a tweet is positive, negative or neutral - new we're seeing increasingly complex models analysing the text.

Look for semantic technologies and ontology analysis for examples of this kind of analysis. Coupled with Big Data approaches this can yield fast results over a diverse set of data.

Posted by: Craig B | April 10, 2013 8:54 AM

Report this Comment


How does a company analyze free form stored as script? Doesn't evaluation generally require comparisons of information with outcomes?

Posted by: Chas B | April 5, 2013 12:29 PM

Report this Comment

Add Your Comments...

Already Registered?

If you have already registered to Insurance Networking News, please use the form below to login. When completed you will immeditely be directed to post a comment.

Forgot your password?

Not Registered?

You must be registered to post a comment. Click here to register.

Blog Archive

The Good, The Bad and The Ugly Of Enterprise BI

When IT can't deliver, business users build their own applications focusing on agility, flexibility and reaction times.

The IT-Savvy 10%

IBM survey reveals best practices of IT leaders.

The Software-Defined Health Insurer: Radical But Realistic?

Can a tech startup digitally assemble the pieces of a comprehensive, employer-provided health plan?

Data Governance in Insurance Carriers

As the insurance industry moves into a more data-centric world, data governance becomes more critical for ensuring the data is consistent, reliable and usable for analysis.

Fear This

Just days before this Issue, which contains our security cover story, went to press, we got some interesting news: 1.2 billion unique usernames and passwords and 542 million email addresses were reportedly stolen from 420,000 websites, according to The New York Times. The websites ranged from Fortune 500 companies down to small online retailers.

Should You Back Up Enterprise Data to the Cloud?

Six questions that need to be asked before signing on with an outside service.