Enterprising Developments

Do You Really Need a 'Data Lake' in Your Back Yard?

Joe McKendrick
Insurance Experts' Forum, June 16, 2014

Vendors in the data management space lately have been flinging about a new term, “data lake,” to describe the vast pools of data that are forming across enterprises.

What, exactly, is a data lake, and how does it differ from a virtualized data layer, or cloud-based storage? While it sounds an awful like the latest marketing buzzword, it actually offers an alternative to enterprise data warehouses. And, let's face it, a “lake”evokes much more positive imagery than that of a “warehouse.”

Data lakes are part of the Apache Hadoop ecosystem, serving as low-cost repositories for data of all types and sizes. Since data can be quickly poured into them with little fuss or muss, they're relatively low cost to operate, unlike data warehouses, which require ETL, cleaning and normalization of data.

Also see: 10 Business Intelligence Trends for 2014

“Data must be converted into recognizable formats – a laborious time-consuming process that becomes increasingly impractical as data collections grow larger,” note Mark Herman and Michael Delurey, both of Booz, Allen and Hamilton, in a recent paper.

Essentially, all data coming into the organization – regardless of whether it's structured or unstructured – is assembled into a single, large table. Herman and Deluray liken this centralized table to a gigantic spreadsheet with billions of rows and billions of columns. This makes all data available at once to any and all queries.

“One of the main appeals of data lakes is that they incorporate data from any source, from social media to clickstream data, into a single location that empowers enterprises to capitalize on this information,” write Cesar Rojas of Teradata and Audrey Ng of Hortonworks in a new report published by The Data Warehouse Institute (TDWI).

What's the advantage of data lakes to insurance companies? Much of the data that is valuable to the policyholder application and claims administration processes is based on a lot of unstructured data: notes from agents, call center notes, photos of properties before and after damage, sensor data from telematics, geospatial data and social media data, just to name a few. The ability to put all this information together, vs. out in separate systems, such as content management, policy administration, and so forth, may enable faster access, at lower costs.

In a separate report, Teradata and Hortonworks provide key steps for data lake development:

1) Get the plumbing in place. As Hadoop is rolled out, the data lake can start as a small pilot project. In the meantime, everyone learns how to make this new way of looking at data work.

2) Build transformation and analytics muscle. “The second stage involves improving the ability to transform and analyze data,” the report notes. :In this stage, companies find the tools that are most appropriate to their skillset and start acquiring more data and building applications.” Capabilities from the enterprise data warehouse and the data lake are used together.”

3) Broaden the operational impact. “The third stage involves getting data and analytics into the hands of as many people as possible,” the report states.

4) Add enterprise capabilities. “In this highest stage of the data lake, enterprise capabilities are added to the data lake. Few companies have reached this level of maturity, but many will as the use of big data grows, requiring governance, compliance, security, and auditing.”

Comments (0)

Be the first to comment on this post using the section below.

Add Your Comments...

Already Registered?

If you have already registered to Insurance Networking News, please use the form below to login. When completed you will immeditely be directed to post a comment.

Forgot your password?

Not Registered?

You must be registered to post a comment. Click here to register.

Blog Archive

On Thanking the Regulator … Really

The Financial Conduct Authority is demanding higher standards of consumer protection from insurers, which could lead to greater customer engagement and understanding.

Competing with the Coasts for Tech Talent

Are heartland-based insurers at a recruiting disadvantage for tech skills?

Putting Your Investments Where Your Transformation Is: Part 2: Optimizing Your IT Investments Portfolio

Sam Medina continues a 3-part series on Transforming the IT Investment Budget in order to fund new programs and initiatives without the necessity of additional capital expense.

Boosting Performance with Integrated Underwriting Tools

A unified, comprehensive platform can help underwriters perform their jobs more efficiently — and profitably.

Apply Mindfulness to Leadership

Managers can benefit from applying this theory both to their career aspirations as well as to interactions and expectations of staff.

Opinion: Halbig Decision Creates New Level of Uncertainty for Obamacare

Time will tell if the Halbig decision remains viable. But in the meantime, a new level of uncertainty has been injected into the process.

Advertisement

Advertisement