Enterprising Developments

Do You Really Need a 'Data Lake' in Your Back Yard?

Joe McKendrick
Insurance Experts' Forum, June 16, 2014

Vendors in the data management space lately have been flinging about a new term, “data lake,” to describe the vast pools of data that are forming across enterprises.

What, exactly, is a data lake, and how does it differ from a virtualized data layer, or cloud-based storage? While it sounds an awful like the latest marketing buzzword, it actually offers an alternative to enterprise data warehouses. And, let's face it, a “lake”evokes much more positive imagery than that of a “warehouse.”

Data lakes are part of the Apache Hadoop ecosystem, serving as low-cost repositories for data of all types and sizes. Since data can be quickly poured into them with little fuss or muss, they're relatively low cost to operate, unlike data warehouses, which require ETL, cleaning and normalization of data.

Also see: 10 Business Intelligence Trends for 2014

“Data must be converted into recognizable formats – a laborious time-consuming process that becomes increasingly impractical as data collections grow larger,” note Mark Herman and Michael Delurey, both of Booz, Allen and Hamilton, in a recent paper.

Essentially, all data coming into the organization – regardless of whether it's structured or unstructured – is assembled into a single, large table. Herman and Deluray liken this centralized table to a gigantic spreadsheet with billions of rows and billions of columns. This makes all data available at once to any and all queries.

“One of the main appeals of data lakes is that they incorporate data from any source, from social media to clickstream data, into a single location that empowers enterprises to capitalize on this information,” write Cesar Rojas of Teradata and Audrey Ng of Hortonworks in a new report published by The Data Warehouse Institute (TDWI).

What's the advantage of data lakes to insurance companies? Much of the data that is valuable to the policyholder application and claims administration processes is based on a lot of unstructured data: notes from agents, call center notes, photos of properties before and after damage, sensor data from telematics, geospatial data and social media data, just to name a few. The ability to put all this information together, vs. out in separate systems, such as content management, policy administration, and so forth, may enable faster access, at lower costs.

In a separate report, Teradata and Hortonworks provide key steps for data lake development:

1) Get the plumbing in place. As Hadoop is rolled out, the data lake can start as a small pilot project. In the meantime, everyone learns how to make this new way of looking at data work.

2) Build transformation and analytics muscle. “The second stage involves improving the ability to transform and analyze data,” the report notes. :In this stage, companies find the tools that are most appropriate to their skillset and start acquiring more data and building applications.” Capabilities from the enterprise data warehouse and the data lake are used together.”

3) Broaden the operational impact. “The third stage involves getting data and analytics into the hands of as many people as possible,” the report states.

4) Add enterprise capabilities. “In this highest stage of the data lake, enterprise capabilities are added to the data lake. Few companies have reached this level of maturity, but many will as the use of big data grows, requiring governance, compliance, security, and auditing.”

Comments (0)

Be the first to comment on this post using the section below.

Add Your Comments...

Already Registered?

If you have already registered to Insurance Networking News, please use the form below to login. When completed you will immeditely be directed to post a comment.

Forgot your password?

Not Registered?

You must be registered to post a comment. Click here to register.

Blog Archive

Despite Valiant Efforts, Insurers' Consumer Ratings Drop

Insurers also are confronting waves of disruptive changes, including big data analytics, an aging population, ongoing economic uncertainty and the growing frequency and severity of natural disasters, which threaten to challenge and undermine businesses.

Why You Can't Take a Wrecking Ball to Your Legacy System

If you think of enterprises like collections of neighborhoods that need to be nurtured, you quickly see that architecture, not obliteration, is the key.

The Apple Bounce: Are Wearables Truly this Big?

I just don’t believe it; only 720,000 Androidwear watches were sold in 2014. Apple has been amazingly successful in so many markets. Were they always first? No, a lot of products before. Were they always best? Again, no, superior devices have fallen.

Ten Stats About Social, Mobile, Analytics, Big Data, Cloud and Digital

Deployment rates have grown in the year since Novarica’s last study on these topics.

Trends in P&C and L/H/A Policy Administration Systems

Novarica research shows that nearly 40 percent of P&C and life/health/annuity carriers are currently replacing or planning to replace a policy administration system.

How Quote Data Can Deliver Powerful Business Insights

Quote data often is disregarded due to its volume, but properly managed can offer insights into product and pricing strategy, expense control, cross selling and upselling.