Enterprising Developments

Do You Really Need a 'Data Lake' in Your Back Yard?

Joe McKendrick
Insurance Experts' Forum, June 16, 2014

Vendors in the data management space lately have been flinging about a new term, “data lake,” to describe the vast pools of data that are forming across enterprises.

What, exactly, is a data lake, and how does it differ from a virtualized data layer, or cloud-based storage? While it sounds an awful like the latest marketing buzzword, it actually offers an alternative to enterprise data warehouses. And, let's face it, a “lake”evokes much more positive imagery than that of a “warehouse.”

Data lakes are part of the Apache Hadoop ecosystem, serving as low-cost repositories for data of all types and sizes. Since data can be quickly poured into them with little fuss or muss, they're relatively low cost to operate, unlike data warehouses, which require ETL, cleaning and normalization of data.

Also see: 10 Business Intelligence Trends for 2014

“Data must be converted into recognizable formats – a laborious time-consuming process that becomes increasingly impractical as data collections grow larger,” note Mark Herman and Michael Delurey, both of Booz, Allen and Hamilton, in a recent paper.

Essentially, all data coming into the organization – regardless of whether it's structured or unstructured – is assembled into a single, large table. Herman and Deluray liken this centralized table to a gigantic spreadsheet with billions of rows and billions of columns. This makes all data available at once to any and all queries.

“One of the main appeals of data lakes is that they incorporate data from any source, from social media to clickstream data, into a single location that empowers enterprises to capitalize on this information,” write Cesar Rojas of Teradata and Audrey Ng of Hortonworks in a new report published by The Data Warehouse Institute (TDWI).

What's the advantage of data lakes to insurance companies? Much of the data that is valuable to the policyholder application and claims administration processes is based on a lot of unstructured data: notes from agents, call center notes, photos of properties before and after damage, sensor data from telematics, geospatial data and social media data, just to name a few. The ability to put all this information together, vs. out in separate systems, such as content management, policy administration, and so forth, may enable faster access, at lower costs.

In a separate report, Teradata and Hortonworks provide key steps for data lake development:

1) Get the plumbing in place. As Hadoop is rolled out, the data lake can start as a small pilot project. In the meantime, everyone learns how to make this new way of looking at data work.

2) Build transformation and analytics muscle. “The second stage involves improving the ability to transform and analyze data,” the report notes. :In this stage, companies find the tools that are most appropriate to their skillset and start acquiring more data and building applications.” Capabilities from the enterprise data warehouse and the data lake are used together.”

3) Broaden the operational impact. “The third stage involves getting data and analytics into the hands of as many people as possible,” the report states.

4) Add enterprise capabilities. “In this highest stage of the data lake, enterprise capabilities are added to the data lake. Few companies have reached this level of maturity, but many will as the use of big data grows, requiring governance, compliance, security, and auditing.”

Comments (0)

Be the first to comment on this post using the section below.

Add Your Comments...

Already Registered?

If you have already registered to Insurance Networking News, please use the form below to login. When completed you will immeditely be directed to post a comment.

Forgot your password?

Not Registered?

You must be registered to post a comment. Click here to register.

Blog Archive

Global Supply Chain, Local Problem

As a technology provider, your client’s ability to deliver products and services to their customers, when and where they need them, is at the heart of their business success.

Legacy Systems Are Increasingly a Competitive Handicap

Legacy systems, while reliable, increasingly hold insurers back, a new study finds

From Her to Watson, and What’s Next?

Imagine a learning system that can replace the performance of your best employee to provide the same level of support across the organization.

Five Reasons to Software-Define Your Operations

It may be possible to provision key services with the click of a mouse, but benefits go well beyond that.

3 Policy Admin Conversion Considerations

Insurers would be wise to learn these lessons before formulating a strategy to convert policies to a new policy administration system.

Boyle’s 4th Law - Response Time Matters!

Why many companies don’t do a good job of measuring the thing that clients value the most.