{"id":285,"date":"2012-07-30T16:08:18","date_gmt":"2012-07-30T16:08:18","guid":{"rendered":"http:\/\/www.redpointms.com\/blog\/?p=285"},"modified":"2012-07-30T17:11:33","modified_gmt":"2012-07-30T17:11:33","slug":"big-data-data-warehousing-and-data-mining","status":"publish","type":"post","link":"http:\/\/www.rplead.com\/blog\/ecm\/big-data-data-warehousing-and-data-mining\/","title":{"rendered":"Big Data, Data Warehousing and Data Mining"},"content":{"rendered":"<p><a href=\"http:\/\/www.redpointms.com\/blog\/wp-content\/uploads\/2012\/07\/warehouse.jpg\"><img decoding=\"async\" loading=\"lazy\" class=\"alignleft  wp-image-288\" title=\"warehouse\" src=\"http:\/\/www.redpointms.com\/blog\/wp-content\/uploads\/2012\/07\/warehouse-300x200.jpg\" alt=\"\" width=\"180\" height=\"120\" srcset=\"http:\/\/www.rplead.com\/blog\/wp-content\/uploads\/2012\/07\/warehouse-300x200.jpg 300w, http:\/\/www.rplead.com\/blog\/wp-content\/uploads\/2012\/07\/warehouse-1024x682.jpg 1024w, http:\/\/www.rplead.com\/blog\/wp-content\/uploads\/2012\/07\/warehouse.jpg 1050w\" sizes=\"(max-width: 180px) 100vw, 180px\" \/><\/a>Michael Koploy from <a href=\"http:\/\/www.softwareadvice.com\/bi\/\">Software Advice<\/a> posed recently a\u00a0<a href=\"http:\/\/blog.softwareadvice.com\/articles\/bi\/buzzword-breakdown-5-experts-tackle-3-definitions-1071212\/\">question<\/a>\u00a0about plain definitions of some basic Business Intelligence concepts \u2013 Big Data, Data Warehousing and Data Mining. Although question seems to be quite simple, it is mind provoking due to changes that BI is experiencing to during last year or two. New developments in this area force us to look again at these concepts. Here is my view on these 3 topics:<\/p>\n<p><span style=\"text-decoration: underline;\"><strong>Big Data<\/strong><\/span><\/p>\n<p><strong>Simple definition:<\/strong><\/p>\n<p>The concept of the big data is not new, although it gained popularity during recent years. It describes all the data available to organizations and that includes structured and unstructured data. It is characterized by its large volume, variety and velocity, which makes it challenging to analyze. Until recently organizations tended to limit amount of information by putting breaks and structure through governance and architecture. Too much information was considered bad thing, due to limited capacity of systems and capabilities to process this information.<\/p>\n<p><strong>How it is changing:<\/strong><\/p>\n<p>The old saying \u2013 \u2018garbage in \u2013 garbage out\u2019 is not true anymore. Organizations realized that among the garbage there might be lot of valuable information that could be monetized. This could be done directly or indirectly and used not only to generate revenues but also to gain competitive advantage. The value of information might not be correctly estimated at the time of its creation or during its initial intended use. Value is often defined by its context &#8211; to paraphrase &#8211; \u201cthe value is in the eye of the beholder\u201d, and it is also time variant. Traditional BI was dealing primarily with the structured data, as it was easier to work with and get results quickly. The rest was mostly ignored or treated as necessary evil. The problem however is that unstructured data constitutes around 80 to 85% of data within the organization, or floating over there in the web, and it could be in one or another way related to the business. Social networks like Facebook, Twitter, blogs, discussions, memos, emails and so on are equal sources of potentially useful information. The winners from losers are separated by ability to see the value where others do not, and ability to use it.<\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"text-decoration: underline;\"><strong>Data Warehousing<\/strong><\/span><\/p>\n<p><strong>Simple definition<\/strong><\/p>\n<p>Traditionally data warehousing is a process of consolidating and aggregating information from various sources within the organization, and used for historical analysis and reporting. The outputs from the analysis are used for operational, tactical or strategic planning. Before the data could be used for these purposes however it has to go through process of cleanup, standardization, normalization, integration and so on. Once stored in Data Warehouse it could be aggregated, and correlated to find answers to typical business questions.<\/p>\n<p><strong>How it is changing<\/strong><\/p>\n<p>Once data is in Data Warehouse it becomes relatively non-volatile, time variant, representing subject oriented historical value of data. Here is the problem in the new world \u2013 the process of standardization and structuring of the data often strips the most valuable part \u2013 intrinsic relationships between data, that might not be visible at the time when the structuring rules are established. Usually Data Warehouses are created with specific goals, and these goals might be changing relatively quickly. Adjusting Data Warehouse to fit these new goals might be as painful as turning a large ship in narrow fiord. In the light of Big Data, the whole concept will have to be reevaluated.<\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"text-decoration: underline;\"><strong>Data Mining<\/strong><\/span><\/p>\n<p><strong>Simple Definition<\/strong><\/p>\n<p>In short it is discovery of true meaning of data from large datasets that integrates structured and unstructured data. These datasets might come from data warehouses or from any other data sources. Data mining helps to answer specific business questions that might be unique and might not have predefined processing paths.<\/p>\n<p><strong>How it is changing<\/strong><\/p>\n<p>Data mining is building on available data and thus closely related to the above discussed two terms. Since these terms are changing, so it is the data mining concept. The organizations need to employ innovative techniques like statistical tools, semantic analysis, neural networks, artificial intelligence and so on, to extract information from combination of both structured and unstructured data in order to gain knowledge. This single step is what separates \u2018wheat from the chaff\u2019, winners from losers \u2013 it is the \u2018holy grail\u2019 of Business Intelligence.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Michael Koploy from Software Advice posed recently a\u00a0question\u00a0about plain definitions of some basic Business Intelligence concepts \u2013 Big Data, Data Warehousing and Data Mining. Although question seems to be quite simple, it is mind provoking due to changes that BI &hellip;<\/p>\n<p class=\"read-more\"><a href=\"http:\/\/www.rplead.com\/blog\/ecm\/big-data-data-warehousing-and-data-mining\/\">Read more &raquo;<\/a><\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[34,46,4,5,43,14],"tags":[45,28,40,9,41,8,17,29,30],"jetpack_featured_media_url":"","_links":{"self":[{"href":"http:\/\/www.rplead.com\/blog\/wp-json\/wp\/v2\/posts\/285"}],"collection":[{"href":"http:\/\/www.rplead.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.rplead.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.rplead.com\/blog\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"http:\/\/www.rplead.com\/blog\/wp-json\/wp\/v2\/comments?post=285"}],"version-history":[{"count":4,"href":"http:\/\/www.rplead.com\/blog\/wp-json\/wp\/v2\/posts\/285\/revisions"}],"predecessor-version":[{"id":290,"href":"http:\/\/www.rplead.com\/blog\/wp-json\/wp\/v2\/posts\/285\/revisions\/290"}],"wp:attachment":[{"href":"http:\/\/www.rplead.com\/blog\/wp-json\/wp\/v2\/media?parent=285"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.rplead.com\/blog\/wp-json\/wp\/v2\/categories?post=285"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.rplead.com\/blog\/wp-json\/wp\/v2\/tags?post=285"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}