Monthly Archives: April 2012

Master Data Management and Governance

DataMicrosoft SharePoint 2007 and then 2010 triggered rapid rates of adoption of collaboration and document management systems. Soon many organizations painfully realized the importance of Information Governance. Without it, the implementations quickly became digital landfill, just replacing but not improving shared drives problems. Often departments started building their own sites, with their own branding, cumbersome and unmanageable security structures, own metadata, poor or entirely missing taxonomies, leading to state of mess where users couldn’t find anything. Even worse, duplication of documents led to confusion, the business decisions based on outdated data, the storage size and backup costs exponential increase, and deterioration of systems performance. Worst of worst, since information was not purged or when it was, it happened randomly, this exposed the organizations to e-Discovery related legal risks and litigation costs.

To address these problems, organizations needed to develop set of aligned governance constructs within an overall Information Governance Framework. Among those constructs are Information Security Governance, Information Architecture, Data Quality, Records and Retention, Master and Reference Data just to mention few. I think that the latter plays very significant role and should be done early to get information under control.

So how Master Data Management could be defined? It is a set of processes, tools and organizational structures, where business and IT work together to address issues likes uniformity, accuracy, stewardship, and consistency and accountability of the organization’s data. This leads the data to become authoritative, secure, reliable and sustainable.  But not all data should get the same level of attention.  Master data is a ‘key’ data gathered and used by multiple departments during operations of the business like for example – customer data, information about products, employees, materials and so on. Master Data must contain most accurate and authoritative data available, and serve as single source of truth across the organization. Lot of organizations however find it difficult to secure the necessary funding and support from senior management, due to difficulty with measurement of return on investment.

Earlier this year, Gartner published some predictions related to Master Data Management governance and impact on organizations by end of 2016:

–          Only 33% of organizations that initiated MDM will be able to demonstrate its value. The difficulty here is that such initiative must present complete approach and be an ongoing process rather than once-off isolated project. This means that there needs to be consensus among senior executives and obtaining this is often quite challenging.

–          Spending on information governance must increase fivefold to be successful – and as per point above, needs to include other disciplines within the Information Management Governance Framework like quality management, lifecycle and retention, privacy and security. This will lead to building larger teams focusing on the governance and higher costs.

–          20% of CIOs in regulated industries will lose their jobs failing to implement information governance. IM governance is a construct that allows organization for compliance with regulations, and the primary responsibility for this lies with CIO and Legal Counsel.  Breaches in information security, leaks of confidential information, and breaches in privacy will lead to reputational and financial damage to those organizations.

The good news is that lot of organizations already recognize these risks, as according to Gartner, last year they have seen 21% increase in spending on MDM.


Office 365 offers entry point to the Cloud but with limitations

Office 365 is making steady progress in capturing small and medium business market segments with its software-as-a-service office suite and especially with cloud based version of SharePoint 2010. Adoption in larger enterprises is much slower however. For lot of organizations Office 365 is an excellent entry point into Cloud services that allows reduction of operational costs, physical storage requirements, and more optimal use of support resources. This all translates into reduction of total cost of ownership, in addition to elimination of more intangible headaches and risks like software updates or upgrades. However, quite a few organizations still have concerns related to security, reliability, ownership of data, privacy, or lack of knowledge what to do with existing on-site installations and investments. Honestly speaking, with regards to security or reliability – for most of organizations, cloud services are usually better in those areas than in-house operations. Cloud companies like Amazon, Microsoft or Rackspace have whole teams dedicated to these subjects, monitoring servers 24/7. Regarding ownership of the data, this shouldn’t be an issue either, since the data is not shared, even in multitenant environment (Microsoft offers two models – multitenant and dedicated, the latter might be an option for those who are obsessed with information protection). Deciding what to do with existing SharePoint installations, and the privacy – are valid concerns. In some countries (Canada is one of them, and so is European Union), passing information that includes personal data of users or clients across country borders, is illegal. Recently Microsoft announced cloud solution that would secure and limit the boundaries of the information transfer specifically to address government requirements, but so far this is limited only to the US. Also, the Microsoft SharePoint offering that is part of Office 365 suite, does not provide all the features that on-site installations have. Some of them:

  • Lack of FAST search solution
  • Lack of integration with Microsoft Information Rights Management
  • Lack of ability to index external databases from SharePoint search
  • Lack of Performance Point Services
  • Lack of support for external lists

So, for organizations that need more sophisticated configurations, this might not be the best option – at least for now.

But there is however another possibility – companies that really want to move into cloud, could try hybrid solutions. Assuming that such organizations have good information architecture and defined business processes, they could partition data and processes in such way that critical information is handled by in-house installations, and the rest is stored and processed using cloud solution. The integration of the data might require building a mash-up portals for the end users, so it would require some good thinking before implementation, and solid governance in place. It is important however to understand limitations of such solution – for example – federated search based on cloud and on-premises data will not work. Key success factor for such implementation would be a solid understanding of the business requirements, and alignment with overall long term goals of the organization. There are however quite a few benefits that cloud solutions bring and Microsoft is working on closing some of the gaps.

Text analytics and business intelligence

ResearchText analytics is getting more popular recently. Over the years, it was perceived as a step child of business intelligence. Recently I have seen results of a research indicating that most of organizations that implemented business intelligence were still waiting to realize their ROI. I think that the problem is that BI in its current narrow definition of dealing primarily with structured data gives only partial answers to business questions. After all, only 15 to 20 % of information that the organizations deal with is structured. Interestingly – the concept of business intelligence was first introduced in IBM Journal in 1950s by Hans Peter Luhn in his article “A Business Intelligence System”. He defined it as “automatic method to provide current awareness services to scientists and engineers” and “interrelationships of presented facts in such way as to guide action towards desired goal”. Luhn did not refer selectively to structured data, as a matter of fact part of his life was devoted to solving problems of information retrieval and storage faced by libraries, documents and records centers. Even for IBM, in 1950s – computerized methods were still at very early stages. Over the years however, as the computers became part of the business life, the analysis of data went the path of lowest resistance – exploration of data that is structured, and by its nature fairly straightforward to compare, categorize, and identify trends; data that one could apply mathematical models to process. Thus over time the structured data analysis became almost synonymous with business intelligence. The text analytics was still preserved in business domains such as market research or pharma. Recently however, the text analytics is experiencing its renaissance, and there are several reasons for this. One is the national security – governments are spending billions of dollars on development of analytical tools allowing them to search the ‘big data’ in shortest possible time to identify threats. Another one is that lot of organizations also realized that they need to listen more to their customers– hence in market research – disciplines like customer experience management, enterprise feedback management or voice of customer in CRM – are booming. Another aspect that brings acceleration to text analytics rise is the change to the way how we communicate, brought by latest social technologies and the ‘big data’. The concept of ‘big data’ is sort of misleading – after all storage costs and size is not a problem – lot of companies that are selling cloud services – offer few gigabytes here and there for free. The issue is not so much with the size of the data but the size and degree of its ‘unstructureness’. To make sense of the information stored, and make use of it, the organizations need methods, tools and processes to digest and analyze the data. In the last sentence I made purposeful distinction between data and information –the former is set of raw facts while information is the data put in the context, creating specific meaning to the user. This speed of changes in way how we communicate, makes the term ‘text analytics’ old-fashioned already. We are now talking about analysis of all types of unstructured data, not only the text, but also voice messages, videos, drawings, pictures and other rich media.

So what is text analytics about? It is simply set of techniques and models to turn text into data that could be further analyzed, as in traditional business intelligence, allowing organizations to respond to business problems. By generating semantics, text analytics provides link between search and traditional business intelligence, turning data retrieval into information delivery mechanism. The process discovers and presents the uncovered facts, business rules and relationships. There are several analytical methods employed in this process, using statistical, linguistic and structural techniques. Here are few examples:

  • Named entity recognition – to identify from the textual sources names of people, organizations, locations, symbols and so on
  • Similarity detection and disambiguation – based on contextual clues to distinguish that for example the word “bass” refers to fish and not to the instrument
  • Pattern based recognition – based on employing regular expressions, for example to identify and standardize phone numbers, emails, postal codes and so on
  • Concept recognition – clustering data entities around defined ideas
  • Relationship recognition – finding associations between data entities
  • Co-reference recognition – multiple terms referring to the same object, which could be quite complex – in the example below the pronoun refers to two different people:
    • Paul gave money to Stephen. He had nothing left.
    • Paul gave money to Stephen. He was rich.
  • Sentiment techniques – subjective analysis to discover attitude based on source data – opinion, mood, emotion, sentiment
  • Quantitative analysis – extracting semantic or grammatical relationships between words to find meaning

As we can see the difficulty with extracting information from unstructured data could be quite immense, although it is not impossible task. It requires however quite a lot of commitment from the organization to implement it. If done properly, it can help with addressing lot of problems that enterprises face today. These problems are related to perceived information overload, poor information governance, and low quality of metadata that leads to poor findability and knowledge. This in turn impacts organizational productivity. As for information external to organizations – the ‘big data’ question – how to monetize on the social media, will drive new technological and business solutions. It seems that this is one of the areas that will experience substantial growth. Using only ‘transactional’ business intelligence based on structured information, is insufficient for organizations to get the full picture. The solution is rather the ‘integrated business intelligence’ combining structured and unstructured data in providing answers to business questions.