Data is the Next Intel Inside... - Tim O’Reilly

Friday, October 10, 2014

3 Roadblocks to Big Data ROI

Most organizations that implement big data platforms expect to derive significant value from their investment. But nearly half of these firms aren't achieving the level of value or return on investment (ROI) that they had expected.

According to a new study by Wikibon, an open-source research firm that competes with Gartner and Forrester, the ROI of these big-data projects is proving to be a big letdown for most enterprises.

"In the long term, they expect USD 3 to USD 4 return on investment for every dollar. But based on our analysis, the average company right now is getting a return of about 55 cents on the dollar," said Jeffrey F. Kelly, Wikibon principal research contributor, in a phone interview with InformationWeek.

Wikibon bases its findings on multiple information sources, including conversations with big data vendors and service providers, feedback from the Wikibon community, and results from a survey of nearly 100 "big data practitioners," the firm said.

Forty-six percent of survey respondents reported that they've realized only "partial value" from their big data deployments, while 2% called their deployments "total failures, with no value achieved," the report states.

So what's the problem? Wikibon identified three key reasons for companies' inability to achieve maximum ROI from big data.

1. Lack of skilled Big-Data experts

The data scientist shortage is a well-chronicled phenomenon and one that might persist for some time.

"In terms of the lack of skilled practitioners, I don't see that changing anytime soon," said Kelly.

A company's existing staff, such as a database administrator (DBA) with years of Oracle experience, probably lacks the skills to manage big data technologies like Hadoop, he added. In the short term, this dilemma provides an opportunity for big-data services firms to fill the gap.

2. Immature technology

Big-data tools are in their infancy. They require refinement for use by a wider range of business workers -- not just highly trained data scientists -- a problem that many software developers are working to solve.

3. Lack of a compelling need

Enterprises often invest in big-data projects without tying these efforts to specific and measurable business applications.

"In such cases, largely driven by IT departments, enterprises begin amassing large volumes of data in Hadoop, which is sometimes made available to data scientists and business analysts for exploratory analysis, but that otherwise sits underutilized," says the Wikibon report.

"A lot of these deployments are driven by IT departments, which sometimes are looking to offload some of the workload from their existing relational systems," said Kelly. "Basically they load in a lot of data, and make it available to their data scientists and analysts to do some exploratory analysis. You've got a lot of experimenting going on, but no real business application tied to it."

To overcome these big-data obstacles, the Wikibon report advises businesses to consider professional services organizations, cloud services or both. It's also important to clearly define a project's goals before you begin.

"Generally we recommend that enterprises start with small, strategic [projects]. Pick a very discrete use case, something that's going to be fairly easy to measure," said Kelly. "Do it in an area that's strategic to your business rather than a peripheral use case."

He added: "Most of the successful projects we've seen are not initiated by IT, but are driven more by line of business departments, either marketing or finance."

Gartner debunks five Big Data myths

With so much hype about big data, it's hard for IT leaders to know how to exploit its potential. Gartner, Inc. dispels five myths to help IT leaders evolve their information infrastructure strategies. 

"Big data offers big opportunities, but poses even bigger challenges. Its sheer volume doesn't solve the problems inherent in all data," said Alexander Linden, research director at Gartner. "IT leaders need to cut through the hype and confusion, and base their actions on known facts and business-driven outcomes.

" Myth No. 1: Everyone Is Ahead of Us in Adopting Big Data 

Interest in big data technologies and services is at a record high, with 73 percent of the organizations Gartner surveyed in 2014 investing or planning to invest in them. But most organizations are still in the very early stages of adoption — only 13 percent of those we surveyed had actually deployed these solutions.

The biggest challenges that organizations face are to determine how to obtain value from big data, and how to decide where to start. Many organizations get stuck at the pilot stage because they don't tie the technology to business processes or concrete use cases.

Myth No. 2: We Have So Much Data, We Don't Need to Worry About Every Little Data Flaw 

IT leaders believe that the huge volume of data that organizations now manage makes individual data quality flaws insignificant due to the "law of large numbers." Their view is that individual data quality flaws don't influence the overall outcome when the data is analyzed because each flaw is only a tiny part of the mass of data in their organization. 

"In reality, although each individual flaw has a much smaller impact on the whole dataset than it did when there was less data, there are more flaws than before because there is more data," said Ted Friedman, vice president and distinguished analyst at Gartner. "Therefore, the overall impact of poor-quality data on the whole dataset remains the same. In addition, much of the data that organizations use in a big data context comes from outside, or is of unknown structure and origin. This means that the likelihood of data quality issues is even higher than before. So data quality is actually more important in the world of big data." 

Myth No. 3: Big Data Technology Will Eliminate the Need for Data Integration 

The general view is that big data technology — specifically the potential to process information via a "schema on read" approach — will enable organizations to read the same sources using multiple data models. Many people believe this flexibility will enable end users to determine how to interpret any data asset on demand. It will also, they believe, provide data access tailored to individual users.

In reality, most information users rely significantly on "schema on write" scenarios in which data is described, content is prescribed, and there is agreement about the integrity of data and how it relates to the scenarios. 

Myth No. 4: It's Pointless Using a Data Warehouse for Advanced Analytics
  
Many information management (IM) leaders consider building a data warehouse to be a time-consuming and pointless exercise when advanced analytics use new types of data beyond the data warehouse. The reality is that many advanced analytics projects use a data warehouse during the analysis. In other cases, IM leaders must refine new data types that are part of big data to make them suitable for analysis. They have to decide which data is relevant, how to aggregate it, and the level of data quality necessary — and this data refinement can happen in places other than the data warehouse. 

Myth No. 5: Data Lakes Will Replace the Data Warehouse 

Vendors market data lakes as enterprisewide data management platforms for analyzing disparate sources of data in their native formats. 

In reality, it's misleading for vendors to position data lakes as replacements for data warehouses or as critical elements of customers' analytical infrastructure. A data lake's foundational technologies lack the maturity and breadth of the features found in established data warehouse technologies. "Data warehouses already have the capabilities to support a broad variety of users throughout an organization. IM leaders don't have to wait for data lakes to catch up," said Nick Heudecker, research director at Gartner.
With so much hype about big data, it's hard for IT leaders to know how to exploit its potential. Gartner, Inc. dispels five myths to help IT leaders evolve their information infrastructure strategies. "Big data offers big opportunities, but poses even bigger challenges. Its sheer volume doesn't solve the problems inherent in all data," said Alexander Linden, research director at Gartner. "IT leaders need to cut through the hype and confusion, and base their actions on known facts and business-driven outcomes." Myth No. 1: Everyone Is Ahead of Us in Adopting Big Data Interest in big data technologies and services is at a record high, with 73 percent of the organizations Gartner surveyed in 2014 investing or planning to invest in them. But most organizations are still in the very early stages of adoption — only 13 percent of those we surveyed had actually deployed these solutions The biggest challenges that organizations face are to determine how to obtain value from big data, and how to decide where to start. Many organizations get stuck at the pilot stage because they don't tie the technology to business processes or concrete use cases. Myth No. 2: We Have So Much Data, We Don't Need to Worry About Every Little Data Flaw IT leaders believe that the huge volume of data that organizations now manage makes individual data quality flaws insignificant due to the "law of large numbers." Their view is that individual data quality flaws don't influence the overall outcome when the data is analyzed because each flaw is only a tiny part of the mass of data in their organization. "In reality, although each individual flaw has a much smaller impact on the whole dataset than it did when there was less data, there are more flaws than before because there is more data," said Ted Friedman, vice president and distinguished analyst at Gartner. "Therefore, the overall impact of poor-quality data on the whole dataset remains the same. In addition, much of the data that organizations use in a big data context comes from outside, or is of unknown structure and origin. This means that the likelihood of data quality issues is even higher than before. So data quality is actually more important in the world of big data." Myth No. 3: Big Data Technology Will Eliminate the Need for Data Integration The general view is that big data technology — specifically the potential to process information via a "schema on read" approach — will enable organizations to read the same sources using multiple data models. Many people believe this flexibility will enable end users to determine how to interpret any data asset on demand. It will also, they believe, provide data access tailored to individual users. In reality, most information users rely significantly on "schema on write" scenarios in which data is described, content is prescribed, and there is agreement about the integrity of data and how it relates to the scenarios. Myth No. 4: It's Pointless Using a Data Warehouse for Advanced Analytics Many information management (IM) leaders consider building a data warehouse to be a time-consuming and pointless exercise when advanced analytics use new types of data beyond the data warehouse. The reality is that many advanced analytics projects use a data warehouse during the analysis. In other cases, IM leaders must refine new data types that are part of big data to make them suitable for analysis. They have to decide which data is relevant, how to aggregate it, and the level of data quality necessary — and this data refinement can happen in places other than the data warehouse. Myth No. 5: Data Lakes Will Replace the Data Warehouse Vendors market data lakes as enterprisewide data management platforms for analyzing disparate sources of data in their native formats. In reality, it's misleading for vendors to position data lakes as replacements for data warehouses or as critical elements of customers' analytical infrastructure. A data lake's foundational technologies lack the maturity and breadth of the features found in established data warehouse technologies. "Data warehouses already have the capabilities to support a broad variety of users throughout an organization. IM leaders don't have to wait for data lakes to catch up," said Nick Heudecker, research director at Gartner.

Read more at: http://www.informationweek.in/informationweek/news-analysis/298061/gartner-debunks-myths?utm_source=referrence_article
With so much hype about big data, it's hard for IT leaders to know how to exploit its potential. Gartner, Inc. dispels five myths to help IT leaders evolve their information infrastructure strategies. "Big data offers big opportunities, but poses even bigger challenges. Its sheer volume doesn't solve the problems inherent in all data," said Alexander Linden, research director at Gartner. "IT leaders need to cut through the hype and confusion, and base their actions on known facts and business-driven outcomes." Myth No. 1: Everyone Is Ahead of Us in Adopting Big Data Interest in big data technologies and services is at a record high, with 73 percent of the organizations Gartner surveyed in 2014 investing or planning to invest in them. But most organizations are still in the very early stages of adoption — only 13 percent of those we surveyed had actually deployed these solutions The biggest challenges that organizations face are to determine how to obtain value from big data, and how to decide where to start. Many organizations get stuck at the pilot stage because they don't tie the technology to business processes or concrete use cases. Myth No. 2: We Have So Much Data, We Don't Need to Worry About Every Little Data Flaw IT leaders believe that the huge volume of data that organizations now manage makes individual data quality flaws insignificant due to the "law of large numbers." Their view is that individual data quality flaws don't influence the overall outcome when the data is analyzed because each flaw is only a tiny part of the mass of data in their organization. "In reality, although each individual flaw has a much smaller impact on the whole dataset than it did when there was less data, there are more flaws than before because there is more data," said Ted Friedman, vice president and distinguished analyst at Gartner. "Therefore, the overall impact of poor-quality data on the whole dataset remains the same. In addition, much of the data that organizations use in a big data context comes from outside, or is of unknown structure and origin. This means that the likelihood of data quality issues is even higher than before. So data quality is actually more important in the world of big data." Myth No. 3: Big Data Technology Will Eliminate the Need for Data Integration The general view is that big data technology — specifically the potential to process information via a "schema on read" approach — will enable organizations to read the same sources using multiple data models. Many people believe this flexibility will enable end users to determine how to interpret any data asset on demand. It will also, they believe, provide data access tailored to individual users. In reality, most information users rely significantly on "schema on write" scenarios in which data is described, content is prescribed, and there is agreement about the integrity of data and how it relates to the scenarios. Myth No. 4: It's Pointless Using a Data Warehouse for Advanced Analytics Many information management (IM) leaders consider building a data warehouse to be a time-consuming and pointless exercise when advanced analytics use new types of data beyond the data warehouse. The reality is that many advanced analytics projects use a data warehouse during the analysis. In other cases, IM leaders must refine new data types that are part of big data to make them suitable for analysis. They have to decide which data is relevant, how to aggregate it, and the level of data quality necessary — and this data refinement can happen in places other than the data warehouse. Myth No. 5: Data Lakes Will Replace the Data Warehouse Vendors market data lakes as enterprisewide data management platforms for analyzing disparate sources of data in their native formats. In reality, it's misleading for vendors to position data lakes as replacements for data warehouses or as critical elements of customers' analytical infrastructure. A data lake's foundational technologies lack the maturity and breadth of the features found in established data warehouse technologies. "Data warehouses already have the capabilities to support a broad variety of users throughout an organization. IM leaders don't have to wait for data lakes to catch up," said Nick Heudecker, research director at Gartner.

Read more at: http://www.informationweek.in/informationweek/news-analysis/298061/gartner-debunks-myths?utm_source=referrence_article