Tag Archives: Information value

Big Data, Data Warehousing and Data Mining

Michael Koploy from Software Advice posed recently a question about plain definitions of some basic Business Intelligence concepts – Big Data, Data Warehousing and Data Mining. Although question seems to be quite simple, it is mind provoking due to changes that BI is experiencing to during last year or two. New developments in this area force us to look again at these concepts. Here is my view on these 3 topics:

Big Data

Simple definition:

The concept of the big data is not new, although it gained popularity during recent years. It describes all the data available to organizations and that includes structured and unstructured data. It is characterized by its large volume, variety and velocity, which makes it challenging to analyze. Until recently organizations tended to limit amount of information by putting breaks and structure through governance and architecture. Too much information was considered bad thing, due to limited capacity of systems and capabilities to process this information.

How it is changing:

The old saying – ‘garbage in – garbage out’ is not true anymore. Organizations realized that among the garbage there might be lot of valuable information that could be monetized. This could be done directly or indirectly and used not only to generate revenues but also to gain competitive advantage. The value of information might not be correctly estimated at the time of its creation or during its initial intended use. Value is often defined by its context – to paraphrase – “the value is in the eye of the beholder”, and it is also time variant. Traditional BI was dealing primarily with the structured data, as it was easier to work with and get results quickly. The rest was mostly ignored or treated as necessary evil. The problem however is that unstructured data constitutes around 80 to 85% of data within the organization, or floating over there in the web, and it could be in one or another way related to the business. Social networks like Facebook, Twitter, blogs, discussions, memos, emails and so on are equal sources of potentially useful information. The winners from losers are separated by ability to see the value where others do not, and ability to use it.


Data Warehousing

Simple definition

Traditionally data warehousing is a process of consolidating and aggregating information from various sources within the organization, and used for historical analysis and reporting. The outputs from the analysis are used for operational, tactical or strategic planning. Before the data could be used for these purposes however it has to go through process of cleanup, standardization, normalization, integration and so on. Once stored in Data Warehouse it could be aggregated, and correlated to find answers to typical business questions.

How it is changing

Once data is in Data Warehouse it becomes relatively non-volatile, time variant, representing subject oriented historical value of data. Here is the problem in the new world – the process of standardization and structuring of the data often strips the most valuable part – intrinsic relationships between data, that might not be visible at the time when the structuring rules are established. Usually Data Warehouses are created with specific goals, and these goals might be changing relatively quickly. Adjusting Data Warehouse to fit these new goals might be as painful as turning a large ship in narrow fiord. In the light of Big Data, the whole concept will have to be reevaluated.


Data Mining

Simple Definition

In short it is discovery of true meaning of data from large datasets that integrates structured and unstructured data. These datasets might come from data warehouses or from any other data sources. Data mining helps to answer specific business questions that might be unique and might not have predefined processing paths.

How it is changing

Data mining is building on available data and thus closely related to the above discussed two terms. Since these terms are changing, so it is the data mining concept. The organizations need to employ innovative techniques like statistical tools, semantic analysis, neural networks, artificial intelligence and so on, to extract information from combination of both structured and unstructured data in order to gain knowledge. This single step is what separates ‘wheat from the chaff’, winners from losers – it is the ‘holy grail’ of Business Intelligence.

SharePoint and Information Security

Interesting survey was recently published by Cryptozone on SharePoint security. The results are evidence of need and importance of information management governance and proper, upfront design of the information systems. It appears that in most of organizations, the responsibility for assigning of the access rights to SharePoint documents still belongs to IT Administrators, as it was indicated by 69% of respondents. At least this segment of users knew who was in charge; in contrast to 22% who did not even know who managed it. The problem with ceding of the responsibility for content protection entirely to IT is that IT primary focus is on maintenance and configuration of the technical infrastructure, but with limited knowledge and understanding of the content and its specific protection needs. IT cannot and should not make decisions on how particular type of information should be protected, and who should have access to it.

So who should be responsible for making such decisions? The answer seems to be intuitive – the business – but 43% of respondents said that they do not trust document authors to control who should read their documents. This would indicate that most of the users have low levels of awareness and understanding of the security needs. This seems to be confirmed by another set of responses that indicated that over 45% of users did copy sensitive and confidential information to unprotected USB memory sticks and emails. 55% of these respondents claimed that reason for this was the need for sending necessary information to users without access to SharePoint, with further 43% needing it for working at home. Over 30% of users were more concerned about getting the work done rather than security, and another 47% did not even think about security or did not care.

One of the contributing factors leading to taking documents out of SharePoint’s control, was the need to share it with third parties – over 56% of respondents said that their organizations did not have external portals to help with collaboration outside of the organization.

The bottom line is that this exposes the organizations to risks including legal risks and intellectual property theft. Therefore proper solution would be to give some thought before SharePoint is rolled out, answering questions on how the information is going to flow across the organization, how it is going to be accessed, how users will be segmented by their needs and how it is going to be protected. This should lead to development of information management governance, that would clearly describe roles and responsibilities across the organization, and ways how the information should be distributed and protected. Lastly, the most important step is to make the users aware of the security needs, training them on the policies and periodically reinforcing this knowledge.

Information management initiatives – who should be in charge after all?

In 2011 PMI and Forrester jointly published a report – “State of PMO”. Although the report was targeting specifically problems that Project Management Offices face, the interesting thing is that the findings are very much relevant to information management implementations. One of the measured factors in the study was the perception of value that PMO brings to organizations and its correlation to the organizational reporting lines. The surprising outcome of the report was that while organizations perceived the PMOs as of high value where they reported to CEO (38%) or CFO (36%), the approval rate dramatically dropped down when PMOs reported to CIO (22%) and VP IS/IT (15%).  This could lead to conclusion that the lines of business either:

  1. distrust IS/IT departments,
  2. perceive IS/IT as detached from the business and not addressing their real problems, or
  3. benefits from IT/IS initiatives are potentially intangible and/or never measured after projects are  completed

I do not have specific numbers for information management initiatives, but experience seems to confirm similar correlation. When information management projects are not driven by the business but rather by IT, they are often observed with distrust, little confidence and support. Indeed, some of the IT/IS information management initiatives focus on technology, with poor understanding of the business processes, goals and operations. If this is true, to improve the odds, they should be conceptualized and driven by the business groups rather than by IT.  Using Pareto principle, maybe 80% of focus should be on business transformation and knowledge management, and 20% on technology. Delivery should still reside within IS but the business should be firmly in the driver’s seat. The recent explosion in collaboration methods, are blurring the boundaries between the external and internal, business and social, stationary and mobile collaboration, bringing new opportunities and challenges. There is no doubt – the cloud computing is going to revolutionize the way how IS and IT departments work today. IT is becoming increasingly a commodity, and some jobs are quickly disappearing, although recent IDC study brought news that cloud services are going to generate 14 million new jobs by 2015. Too bad that they are going to be in some other, cheaper part of the world. This trend will also force redefinition of the role of the CIO – maybe putting ‘Information’ back into the title – changing the focus from the infrastructure and technology to identification, valuation, definition of metrics and the management of the information as any other enterprise asset. I believe that both – shifting of the responsibility for information management initiatives to the business, as well as recognizing that information is the asset will increase success rates of IM initiatives within organizations, leading to improved profits, reduced risks internally and better service to customers externally.

Transition – Data, Information, Knowledge, Wisdom

I looked at the relationship between the concepts of Data, Information, Knowledge and Wisdom in one of my previous posts. At the time however, I was looking from slightly different perspective. In this post I focus more on the factors that influence transition of the collected raw data into totally abstract entity as wisdom.

Concept Definition Factors contributing to transition Abstraction Level
Data Simplest representation of facts such as numbers, characters, graphics, images, sound and video. Initially in ‘raw’ format, needs to be further processed to gain meaning. Associated metadata is required to add context, describing business understanding, format, date/time, importance and others Low
Information Processed collection of data, with associated metadata describing the context. There might be various metadata dimensions allowing creating new information and its meaning based on different aggregations of facts. It is Data in a context. Identification of trends, patterns, relationships and assumption. Medium
Knowledge Awareness, understanding, familiarity, recognition of situational patterns and trends, based on synthesis of collected information that could be used achieve a business purpose. It is Information in a perspective. Acquiring of skills through experience or education. It includes perception, learning, communication, association and reasoning. Medium High
Wisdom Making the best use of knowledge, acting with appropriate judgement in complex and dynamic environments, that actually achieves business purpose. Directly related to maturity but not related to how long the organization is in business. It is applied knowledge. High


Graphically this could be presented in form of a pyramid, with increasing maturity and abstraction level.


As the abstraction level increases, the concepts become much more difficult to define and describe. For example Wisdom, in contrast to Data, becomes more philosophical idea. The higher the level of abstraction, the fewer organizations could be found utilizing the concept. This is not surprising, due to direct relationship with maturity levels. However, this is the critical factor that differentiates winners from the rest. Most of organizations focus their resources on achieving immediate tactical goals. This works well in short term, but as we can usually see, such organizations survive only in friendly business environment. As soon as the market trends change, such organizations are endangered by takeovers, or breakups. Only few, are able to make such transition, although I don’t think that there are any that fully achieved the Wisdom level. Information management does not contribute directly to products or services that the organizations sell, but like a nervous system in an organism, it is critical to utilization of the available resources to their full potential. The better distribution, sharing and collaboration, the better odds of winning with innovative products, and survival.

Information Management Trends

Recently, while doing some research, I found in my documents a reference to an old Gartner report on knowledge workers productivity and its relationship to search. This report was from 2002 and stated that knowledge workers spent between 30 to 40% of their time searching for information, and they were less than 50% successful in their efforts. According to Kathy Harris and Regina Casonato – employees got 50 to 70% of information from other people rather than from their search results.

This referred to both electronic and physical documents. Physical documents are usually better organized, electronic often become quickly an information dump. Since then, there were new tools adopted and ratio of electronic to paper documents increased. With wide adoption of tools like SharePoint, instant messages, wireless phone texting, Tweeter and so on, there was a dramatic increase in amount of information that is being created and transmitted. Are we better now with the information management that then? I don’t think so. Although the search capabilities increased, and we use more powerful processors, full content search is still not the answer. Are we capturing more contextual information to help with targeted search? The answer is mostly – no – after seeing multiple implementations of SharePoint. Implementation of SharePoint sites became often too easy, without proper thought put into development of information architecture and governance. Very soon such installations turn into an information junkyard.

So what was the cost of lost productivity then in 2002? Assuming that average fully loaded salary of knowledge worker was about $ 80,000 per year, 30% will come to about $ 24,000 – per worker. These costs are mind blowing, especially if we take into account the success rate of less than 50%. So this raises a question – could be used in ECM business cases to support financial benefits, without accountants rejecting them as purely soft benefits? I touched on this in my previous blog post, and Jason White suggested interesting concept of using Business Intelligence tools to identify these benefits. But how to do it before we have ECM tools in place?

This relates to today’s report from Gartner on top 10 tech trends for 2012. Here are few interesting highlights relevant to information management:

–          Average teenager sends over 4,762 text messages per month – I am sure that busy executives with their Blackberries send less than this but it still shows how quickly volume of information is increasing

–          Context aware computing, using information about end user’s or object’s environment to improve quality of interaction – metadata and information architecture come to mind immediately, and its importance will be constantly growing.

–          Internet of everything with pervasive computing linking information generating input points like cameras, sensors, microphones, image recognition and so on. This is not only about the information volume but also about the privacy.

–          Next generation analytics – improvement in processing power will shift the analytics from data centers to end user platforms, including mobile devices. It will empower the end users to do lot of analysis themselves.

So what this all shows? It seems that the problems from 10 years ago were still not resolved, and information management is still trying to catch-up with technology. The focus of information management will have to shift towards proactive development of agile taxonomies, automatic tools to capture and normalize metadata, facilitating targeted search, as well as making analytics tools simpler for end users. This hopefully will turn into increased knowledge worker’s productivity.

Information as an assset

In some business areas, concept of an asset is fairly well developed. From one perspective the key motivator is usage and maintenance of the assets, from the other side, there is the financial aspect – how accountants perceive the assets and how they depreciate over time. The financial aspect is a great motivator to keep the process clean, as usually it is regulated and affects the bottom line of the company. On the other hand, the usage and maintenance of the physical assets, is less structured but easily understandable. You can see it, you can touch it, if you don’t maintain it, it will stop working.

Financial assets in certain way are different – however their management process is well developed as it is primary vehicle for increasing of the revenues, and mostly is regulated. The same motivating factor of proper accounting plays significant role.

Information as an asset – is much more difficult concept to grasp, and often neglected. One of the reasons is that the accountants don’t know how to book it, so the underlying motivating factor as described above, is simply not there. The only time when organizations dig-in their heels with regards to information – is during contract negotiations when it comes to protection of intellectual property. Otherwise the information is allowed to float with little structure, little oversight, protection and management. However, the information is the asset and as the asset it has its own intrinsic value. That’s true, it is difficult to measure it, nevertheless organizations need to define information as the asset and integrate its lifecycle with their overall operational and financial processes. It becomes even more important, when the focus of the company shifts from delivery of physical goods to services.

The same stages of the asset life-cycle are valid with the information:

  • creation
  • storage and preservation
  • management
  • use
  • disposal

Information shares some of the attributes with the physical assets – for example – being time variant. As it ages, information’s value usually decreases, and this needs to be factored-in when developing information value estimation model.

The most important aspect however is that the information value is realized only when it is used. Information that cannot be found is worthless. That is why ability to search and find information is key element of information management. Technology is less important here, development of right taxonomy, classification, controlled vocabularies with ability to tag information at the point of creation – play key role here.

Therefore to be successful, organizations need to:

  • Define information life-cycle and its value
  • Integrate the information life-cycle with overall operational and financial processes of the organization
  • Define information architecture and keep it up to date