Data is the Next Intel Inside... - Tim O’Reilly

Friday, October 10, 2014

3 Roadblocks to Big Data ROI

Most organizations that implement big data platforms expect to derive significant value from their investment. But nearly half of these firms aren't achieving the level of value or return on investment (ROI) that they had expected.

According to a new study by Wikibon, an open-source research firm that competes with Gartner and Forrester, the ROI of these big-data projects is proving to be a big letdown for most enterprises.

"In the long term, they expect USD 3 to USD 4 return on investment for every dollar. But based on our analysis, the average company right now is getting a return of about 55 cents on the dollar," said Jeffrey F. Kelly, Wikibon principal research contributor, in a phone interview with InformationWeek.

Wikibon bases its findings on multiple information sources, including conversations with big data vendors and service providers, feedback from the Wikibon community, and results from a survey of nearly 100 "big data practitioners," the firm said.

Forty-six percent of survey respondents reported that they've realized only "partial value" from their big data deployments, while 2% called their deployments "total failures, with no value achieved," the report states.

So what's the problem? Wikibon identified three key reasons for companies' inability to achieve maximum ROI from big data.

1. Lack of skilled Big-Data experts

The data scientist shortage is a well-chronicled phenomenon and one that might persist for some time.

"In terms of the lack of skilled practitioners, I don't see that changing anytime soon," said Kelly.

A company's existing staff, such as a database administrator (DBA) with years of Oracle experience, probably lacks the skills to manage big data technologies like Hadoop, he added. In the short term, this dilemma provides an opportunity for big-data services firms to fill the gap.

2. Immature technology

Big-data tools are in their infancy. They require refinement for use by a wider range of business workers -- not just highly trained data scientists -- a problem that many software developers are working to solve.

3. Lack of a compelling need

Enterprises often invest in big-data projects without tying these efforts to specific and measurable business applications.

"In such cases, largely driven by IT departments, enterprises begin amassing large volumes of data in Hadoop, which is sometimes made available to data scientists and business analysts for exploratory analysis, but that otherwise sits underutilized," says the Wikibon report.

"A lot of these deployments are driven by IT departments, which sometimes are looking to offload some of the workload from their existing relational systems," said Kelly. "Basically they load in a lot of data, and make it available to their data scientists and analysts to do some exploratory analysis. You've got a lot of experimenting going on, but no real business application tied to it."

To overcome these big-data obstacles, the Wikibon report advises businesses to consider professional services organizations, cloud services or both. It's also important to clearly define a project's goals before you begin.

"Generally we recommend that enterprises start with small, strategic [projects]. Pick a very discrete use case, something that's going to be fairly easy to measure," said Kelly. "Do it in an area that's strategic to your business rather than a peripheral use case."

He added: "Most of the successful projects we've seen are not initiated by IT, but are driven more by line of business departments, either marketing or finance."

Gartner debunks five Big Data myths

With so much hype about big data, it's hard for IT leaders to know how to exploit its potential. Gartner, Inc. dispels five myths to help IT leaders evolve their information infrastructure strategies. 

"Big data offers big opportunities, but poses even bigger challenges. Its sheer volume doesn't solve the problems inherent in all data," said Alexander Linden, research director at Gartner. "IT leaders need to cut through the hype and confusion, and base their actions on known facts and business-driven outcomes.

" Myth No. 1: Everyone Is Ahead of Us in Adopting Big Data 

Interest in big data technologies and services is at a record high, with 73 percent of the organizations Gartner surveyed in 2014 investing or planning to invest in them. But most organizations are still in the very early stages of adoption — only 13 percent of those we surveyed had actually deployed these solutions.

The biggest challenges that organizations face are to determine how to obtain value from big data, and how to decide where to start. Many organizations get stuck at the pilot stage because they don't tie the technology to business processes or concrete use cases.

Myth No. 2: We Have So Much Data, We Don't Need to Worry About Every Little Data Flaw 

IT leaders believe that the huge volume of data that organizations now manage makes individual data quality flaws insignificant due to the "law of large numbers." Their view is that individual data quality flaws don't influence the overall outcome when the data is analyzed because each flaw is only a tiny part of the mass of data in their organization. 

"In reality, although each individual flaw has a much smaller impact on the whole dataset than it did when there was less data, there are more flaws than before because there is more data," said Ted Friedman, vice president and distinguished analyst at Gartner. "Therefore, the overall impact of poor-quality data on the whole dataset remains the same. In addition, much of the data that organizations use in a big data context comes from outside, or is of unknown structure and origin. This means that the likelihood of data quality issues is even higher than before. So data quality is actually more important in the world of big data." 

Myth No. 3: Big Data Technology Will Eliminate the Need for Data Integration 

The general view is that big data technology — specifically the potential to process information via a "schema on read" approach — will enable organizations to read the same sources using multiple data models. Many people believe this flexibility will enable end users to determine how to interpret any data asset on demand. It will also, they believe, provide data access tailored to individual users.

In reality, most information users rely significantly on "schema on write" scenarios in which data is described, content is prescribed, and there is agreement about the integrity of data and how it relates to the scenarios. 

Myth No. 4: It's Pointless Using a Data Warehouse for Advanced Analytics
  
Many information management (IM) leaders consider building a data warehouse to be a time-consuming and pointless exercise when advanced analytics use new types of data beyond the data warehouse. The reality is that many advanced analytics projects use a data warehouse during the analysis. In other cases, IM leaders must refine new data types that are part of big data to make them suitable for analysis. They have to decide which data is relevant, how to aggregate it, and the level of data quality necessary — and this data refinement can happen in places other than the data warehouse. 

Myth No. 5: Data Lakes Will Replace the Data Warehouse 

Vendors market data lakes as enterprisewide data management platforms for analyzing disparate sources of data in their native formats. 

In reality, it's misleading for vendors to position data lakes as replacements for data warehouses or as critical elements of customers' analytical infrastructure. A data lake's foundational technologies lack the maturity and breadth of the features found in established data warehouse technologies. "Data warehouses already have the capabilities to support a broad variety of users throughout an organization. IM leaders don't have to wait for data lakes to catch up," said Nick Heudecker, research director at Gartner.
With so much hype about big data, it's hard for IT leaders to know how to exploit its potential. Gartner, Inc. dispels five myths to help IT leaders evolve their information infrastructure strategies. "Big data offers big opportunities, but poses even bigger challenges. Its sheer volume doesn't solve the problems inherent in all data," said Alexander Linden, research director at Gartner. "IT leaders need to cut through the hype and confusion, and base their actions on known facts and business-driven outcomes." Myth No. 1: Everyone Is Ahead of Us in Adopting Big Data Interest in big data technologies and services is at a record high, with 73 percent of the organizations Gartner surveyed in 2014 investing or planning to invest in them. But most organizations are still in the very early stages of adoption — only 13 percent of those we surveyed had actually deployed these solutions The biggest challenges that organizations face are to determine how to obtain value from big data, and how to decide where to start. Many organizations get stuck at the pilot stage because they don't tie the technology to business processes or concrete use cases. Myth No. 2: We Have So Much Data, We Don't Need to Worry About Every Little Data Flaw IT leaders believe that the huge volume of data that organizations now manage makes individual data quality flaws insignificant due to the "law of large numbers." Their view is that individual data quality flaws don't influence the overall outcome when the data is analyzed because each flaw is only a tiny part of the mass of data in their organization. "In reality, although each individual flaw has a much smaller impact on the whole dataset than it did when there was less data, there are more flaws than before because there is more data," said Ted Friedman, vice president and distinguished analyst at Gartner. "Therefore, the overall impact of poor-quality data on the whole dataset remains the same. In addition, much of the data that organizations use in a big data context comes from outside, or is of unknown structure and origin. This means that the likelihood of data quality issues is even higher than before. So data quality is actually more important in the world of big data." Myth No. 3: Big Data Technology Will Eliminate the Need for Data Integration The general view is that big data technology — specifically the potential to process information via a "schema on read" approach — will enable organizations to read the same sources using multiple data models. Many people believe this flexibility will enable end users to determine how to interpret any data asset on demand. It will also, they believe, provide data access tailored to individual users. In reality, most information users rely significantly on "schema on write" scenarios in which data is described, content is prescribed, and there is agreement about the integrity of data and how it relates to the scenarios. Myth No. 4: It's Pointless Using a Data Warehouse for Advanced Analytics Many information management (IM) leaders consider building a data warehouse to be a time-consuming and pointless exercise when advanced analytics use new types of data beyond the data warehouse. The reality is that many advanced analytics projects use a data warehouse during the analysis. In other cases, IM leaders must refine new data types that are part of big data to make them suitable for analysis. They have to decide which data is relevant, how to aggregate it, and the level of data quality necessary — and this data refinement can happen in places other than the data warehouse. Myth No. 5: Data Lakes Will Replace the Data Warehouse Vendors market data lakes as enterprisewide data management platforms for analyzing disparate sources of data in their native formats. In reality, it's misleading for vendors to position data lakes as replacements for data warehouses or as critical elements of customers' analytical infrastructure. A data lake's foundational technologies lack the maturity and breadth of the features found in established data warehouse technologies. "Data warehouses already have the capabilities to support a broad variety of users throughout an organization. IM leaders don't have to wait for data lakes to catch up," said Nick Heudecker, research director at Gartner.

Read more at: http://www.informationweek.in/informationweek/news-analysis/298061/gartner-debunks-myths?utm_source=referrence_article
With so much hype about big data, it's hard for IT leaders to know how to exploit its potential. Gartner, Inc. dispels five myths to help IT leaders evolve their information infrastructure strategies. "Big data offers big opportunities, but poses even bigger challenges. Its sheer volume doesn't solve the problems inherent in all data," said Alexander Linden, research director at Gartner. "IT leaders need to cut through the hype and confusion, and base their actions on known facts and business-driven outcomes." Myth No. 1: Everyone Is Ahead of Us in Adopting Big Data Interest in big data technologies and services is at a record high, with 73 percent of the organizations Gartner surveyed in 2014 investing or planning to invest in them. But most organizations are still in the very early stages of adoption — only 13 percent of those we surveyed had actually deployed these solutions The biggest challenges that organizations face are to determine how to obtain value from big data, and how to decide where to start. Many organizations get stuck at the pilot stage because they don't tie the technology to business processes or concrete use cases. Myth No. 2: We Have So Much Data, We Don't Need to Worry About Every Little Data Flaw IT leaders believe that the huge volume of data that organizations now manage makes individual data quality flaws insignificant due to the "law of large numbers." Their view is that individual data quality flaws don't influence the overall outcome when the data is analyzed because each flaw is only a tiny part of the mass of data in their organization. "In reality, although each individual flaw has a much smaller impact on the whole dataset than it did when there was less data, there are more flaws than before because there is more data," said Ted Friedman, vice president and distinguished analyst at Gartner. "Therefore, the overall impact of poor-quality data on the whole dataset remains the same. In addition, much of the data that organizations use in a big data context comes from outside, or is of unknown structure and origin. This means that the likelihood of data quality issues is even higher than before. So data quality is actually more important in the world of big data." Myth No. 3: Big Data Technology Will Eliminate the Need for Data Integration The general view is that big data technology — specifically the potential to process information via a "schema on read" approach — will enable organizations to read the same sources using multiple data models. Many people believe this flexibility will enable end users to determine how to interpret any data asset on demand. It will also, they believe, provide data access tailored to individual users. In reality, most information users rely significantly on "schema on write" scenarios in which data is described, content is prescribed, and there is agreement about the integrity of data and how it relates to the scenarios. Myth No. 4: It's Pointless Using a Data Warehouse for Advanced Analytics Many information management (IM) leaders consider building a data warehouse to be a time-consuming and pointless exercise when advanced analytics use new types of data beyond the data warehouse. The reality is that many advanced analytics projects use a data warehouse during the analysis. In other cases, IM leaders must refine new data types that are part of big data to make them suitable for analysis. They have to decide which data is relevant, how to aggregate it, and the level of data quality necessary — and this data refinement can happen in places other than the data warehouse. Myth No. 5: Data Lakes Will Replace the Data Warehouse Vendors market data lakes as enterprisewide data management platforms for analyzing disparate sources of data in their native formats. In reality, it's misleading for vendors to position data lakes as replacements for data warehouses or as critical elements of customers' analytical infrastructure. A data lake's foundational technologies lack the maturity and breadth of the features found in established data warehouse technologies. "Data warehouses already have the capabilities to support a broad variety of users throughout an organization. IM leaders don't have to wait for data lakes to catch up," said Nick Heudecker, research director at Gartner.

Read more at: http://www.informationweek.in/informationweek/news-analysis/298061/gartner-debunks-myths?utm_source=referrence_article

Tuesday, April 22, 2014

Have you ever heard of "Dark Data" ?

I am sure that many of us are hearing this word "Dark Data" for the first time !!!
So what is this term is all about, let's find out..
It sounds like an ominous plot by some evil mastermind intent on world domination.  But don’t worry, "dark data" is more benign than the name suggests.
Although is collects in unlit corners and neglected back rooms, dark data is not a serious threat to your business. In fact, it might be more properly termed “dusty data.”
It’s that neglected data that accumulates in log files and archives that nobody knows what to do with. Although it never sees the light of day, no one feels comfortable destroying it because it might prove useful someday.
Will It Be The “Someday” You have Been Waiting For?
With all the recent press about the value of big data, you may be thinking that now is the time to dive into the secrets of the dark data hiding in your organization.
But before you invest in expanded storage capacity or sophisticated data analytic tools, take time to ask the big questions first – the ones that seek out the real value of the data for your business.
The authors of the CIO.com ebook, Big Data Analysis: What Every CIO Should Know, suggest that you start with such blue-sky questions as:
  • If only we knew . . . .
  • If we could predict . . . .
  • If we could measure . . . .

Determine what information you need in order to answer those high-value questions and use that as the standard by which you  evaluate all the available data, including the dark data that has never been a part of your regular business operations.

Is Your Dark Data a Business Intelligence Gold Mine?

By itself, some of that dark data may not have much value, but combine it with data you already collect or purchase and you may have a digital gold mine. Those web log files that were once just digital clutter could be the key to unlocking changing patterns in customer behavior that can put you ahead of your competition.
By taking the time to assess the value to your business and investing in the tools you need to shine a light on dark data, you may be able to turn those digital “black holes” into real business intelligence that you can put in the hands of your decision-makers.
Even if you determine that it has negligible value for business intelligence, you have accomplished something of merit. Now that you have established the business case for freeing up IT resources wasted on maintaining low-value data, you’re free – at last - to hit the delete key.

Thursday, April 17, 2014

Points to remember for building a Big Data supply chain

Big data can have a large impact on the supply chain and that is exactly how the majority of supply chain executives think about it. Using the different sales data, product sensor data, market information, events and news happening in the world, competitor data and weather conditions can give insights in the expected demand of products used or required in the supply chain. Using predictive algorithms the inventory can be optimized for Just-in-Time delivery and inventory based on real-time demand forecasts. Collaboration with different players within the supply chain can help to shape demand for all organizations within the supply chain to deliver a better B2B and B2C experience.

Following are the few points to remember:

1. Identify business goals 
No one should deploy big data without an overall vision for what will be gained. The foundation for developing these goals is your data science and analytics team working closely with subject matter experts. Data scientists, analysts, and developers must collaborate to prioritize business goals, generate insights, and validate hypotheses and analytic models.

2. Make big data insights operational
It's imperative that the data science team works in conjunction with the devops team. Both groups should ensure that insights and goals are operational, with repeatable processes and methods, and they communicate actionable information to stakeholders, customers, and partners.

3. Build a big data pipeline
The data management and analytics systems architecture must facilitate collaboration and eliminate manual steps. The big data supply chain consists of four key operations necessary for turning raw data into actionable information. 
These include:

  • Acquire and store: Access all types of data from any platform at any latency through adapters to operational and legacy systems, social media, and machine data, with the ability to collect and store data in batch, real-time and near-real-time modes.
  • Refine and enrich: Integrate, cleanse, and prepare data for analysis, while collecting both technical and operational metadata to tag and enrich data sets, making them easier to find and reuse.
  • Explore and curate: Browse data and visualize and discover patterns, trends, and insights with potential business impact; curate and govern those data sets that hold the most business value.
  • Distribute and manage: Transform and distribute actionable information to end-users through mobile devices, enterprise applications, and other means. Manage and support service-level agreements with a flexible deployment architecture.