In view of the need to handle increasingly larger amounts of data and to better cope with the ensuing big data V-characteristics (Volume, Velocity, Variety, Veracity etc.), several large organisations have begun migrating their standard legacy Enterprise Data Warehouse (EDW) systems to Big-Data-Driven-Enterprise (BDDE) schemes, which subsume the EDW.
Many of these schemes utilise the Hadoop ecosystem supported by a variety of additional data mining and business analytics sub-systems. In this Assignment, you will look at how big organisations migrate from standard EDW to BDDE schemes (that incorporate the EDW) and will investigate the key issues that often have to be dealt with in such migrations.
Complete out the following tasks:
- Describe a realor an imaginary company that you want to transform to the BDDE type (you can also use your current employment company as an example, if applicable). Select the industry sector and your particular business activity. Your company’s operations must include a Web presence—you must define what service this Web presence provides to its customers and additionally explain how data will be used and collected.
- Describe data flowsin your company. Identify where you will use data collected from the Web and other sources of information such as sell statistics, user activity or social media data. Describe how you plan to integrate new types of big data with current EDW data workflows that primarily use relational data. Do not forget about operational aspects such as data backups and, if applicable, long-term data storage. Data provenance may be needed in addition to regular activity and communication logs.
- Select a suitable platform for general big data management(e.g. in-house or cloud based infrastructure) and Big Data Management System (BDMS) platform with vendor (e.g. Hadoop supplied by Cloudera CDH, AWS EMR or HortonWorks). You should provide sufficient detail about the BDMS components and tools that you intend to use within the BDDE scheme.
- Provide suggestionsfor how you will address security and privacy issues when managing your customer data, and your company’s data. Again, do not forget about regular backups and secure backup storage for the BDDE.
- Define a data management policy,including data protection and access control. Briefly address a majority of the CSA top ten security and privacy challenges.
- Suggest what big data analytics and visualization methodsyou will use, including specific commercial tools and platforms.
Introduction to Company
The existence of systems used to host databases is in line with reporting as well as the sharing of the information in organisations. Slicing Dice is offers data warehouse and analytic database extensive services and will be the focus of this paper. With the necessity to store, load, query as well as visualize data among engineers, management of composite base of data is always mind-boggling. Therefore, SliceDice came in to suffice the gap by allowing companies to simply put in and question a wide array of real-time, historical, or time-series data eliminating the need for server management. The aforesaid follows a contemporary increased need by users for an instant, effective as well as secure modes. An absolute amount of information amongst the enterprise, retrieving data in an efficient manner mandate for aligned attempts between existent systems.
Since the emergence of Enterprise Data Warehouses about 30 years ago, it not only became popular but an essential facet for SliceDice operations of business intelligence. Therefore, the technology is credited for creating a real impact on the general accomplishments of the organization. In essence, EDW, as implemented in the company, is a database that lays in all data related to the company. Similarly, it provides information to all cleared users, as well as offers support in terms of thorough analytical thinking. Through a detailed and approachable report system, SliceDice implemented warehouse can be used differently by third-party organizations. However, due to the changing needs of the company, consideration is made to integrate big data through Hadoop into the existing EDW architecture.
The move looks forward to creating a relationship between the company's data warehouse with big data, hence, creating a hybrid model. The latter will ardently emerge as extremely structured, and optimized functional data. The latter is intended to be left at the strictly manipulated data warehouse whereas the Hadoop-based infrastructure will have real-time control all data that is subject to alteration as well as extremely distributed data. SliceDice comprehends that it is posed with a business necessity for combining conventional data warehouses, the historic business sources of data as well as a lesser integrated source of big data. Therefore, it resolute to the hybrid model since it will support orthodox and sources of big data in the bid to meet their daily goals.
SliceDice is designed to provide data querying assistance to all companies implementing a web presence as to be mentioned. First, it seeks to bolster clients' financial health so that the relevant executives including CFO, CEO and CIO cannot be extravagant. Limitless as well as free storage of data are offered at different price models which are determined by the volume of inserted data. Similarly, it offers a simple and sheer price which allows clients to select between cheap price models including wage-per-column or wage-per-gigabyte. The platform provides an entirely free test for databases through a live demo runnable through a third-party website. Lastly, SliceDice can easily liberate a client from committing to its website by simply terminating an account as well as instant exporting of the data.
Data Flow
Reckoning the engineers need for peace of mind, SliceDice offers handy features that allow them to focus on insight giving and creativity. First is the entire lack of server management as a solution levying of all infrastructure worries. Secondly, a high presence and backup are enabled through a built-in redundancy having three independent data centres. They provide independent and simultaneous configuration allowing for continuous servicing and support through clients' insertion and query of data. Thirdly, SliceDice poses nil sustenance requirements since one only needs to insert their data and query through SQL or the sites REST API. Therefore, it offers trivial needs for tuning or tweaking databases. A lucid increase in space is also supported without the need of any concern. Set up client libraries are provided hence, making it easy for clients to integrate SliceDice with other frameworks. Creation of native users made from popular languages including Java, Python, JavaScript, PHP, Arduino, Ruby and .NET (Sharif and Cooney 2015).
Another reason for structuring the hybrid data warehouse into SliceDice includes the need for an assured backward compatibility. The move ensures that any use of the API will function despite the undoing by companies to upgrade code. Being an entirely servicing framework, SliceDice doesn't delegate responsibilities which require cooperating with numerous companies so as to link dissimilar technologies in the bid to achieve objectives. Provision of export transform load (ETL), data warehouse as well as visualization capacities. Similarly, without incurring additional costs data can easily be exported from SlicingDice at any time.
The intended use of big data also referred to as NoSQL or Hadoop and boosted analytics is caused by the equivalent advent of the web, social networking, and mobile gadgets. The latter have been causal factors to an essential alteration of data's nature. Big data, unlike the conventional corporate data, is extremely distributed, lightly structured and increasingly huge in volume. The method offers a number of ways to process and analyze big data despite many having similar characteristics. Capitalizing on good hardware will enable scaling out, concurrency in processes; employment of incomparable data storage capacities to litigate both unstructured and semi-structured data. Similarly, the method will apply boosted analytics as well as data visualization technology to big data in the bid to pose insights to clients. The data approaches intended to transform SliceDice company business analytics as well as data management will include Apache Hadoop and NoSQL.
The coexistent strategy of Big Data and EDW on SliceDice architecture will involve several steps led by determining whether and where Hadoop can fit in the existing data warehouse. Being a family of merchandise having numerous capabilities, Hadoop can contribute to a couple of areas in EDW. It is most obliging as a data framework for catching and laying in big data across a diverse data warehouse surrounding as well as appending to the process of data analysis on other platforms (Demchenko 2014). The mentioned approach will enable the firm to secure its investment in EDW framework as well as stretch into accommodating the Big data surrounding. Following the integration, Hadoop will conform to a number of roles within the EDW architecture as mentioned in the following part of this paper.
Suitable Platform for Big Data
The first role would involve, staging the data as processed by the EDW so as to develop root data for particular functions including reporting, analytics and debasing into databases. The latter roles will be performed by native ETL techniques. Hadoop is expected to enable the company to deploy a highly scalable and frugal ETL framework. For instance, a reckoned ETL use case involves offloading hefty translation into Hadoop from the data warehouse, hence "T" in ETL (Smith 2018). The solution behind it was derived because of the organisation's entirety in struggling to scale its conventional ETL architecture. Particularly, numerous data consolidation platforms campaigned for the transition towards the data warehouse. The latter is attributable to why contemporary platforms of data integration in EDW hike to around 80% of database volume as well as resources. The result is untenable expenditures, unending bids for maintenance, and inadequate response to users' query. Therefore, the move to shift T to Hadoop, SliceDice is able to tremendously cut costs as well as leverage database volume and resources, hence, a quick execution to users' query.
Archiving data had conventionally bored only three alternatives with regards to filling away data. The methods included either leaving it amongst a comparative database, moving to an offline program library for storage, or casting. Implementation of Hadoop's measurability, as well as minimum costs, modify the organizations' ability to store data entirely in a promptly approachable online surrounding. Next step involves an outline of flexibility since relational database management systems implemented data warehouse implementations. The latter is thus amply fitted to lay in extremely structured data, derived from databases, Enterprise Resource Systems (ERP), and Customer Relationship Management systems (CRM). However, Hadoop complementarily offers faster and easy ingestion of any data format letting in developing schemas that enable website variable tests as well as no schemas including multimedia files. The implemented Hadoop architecture will enhance the organization's flexibility to process using the Hadoop NoSQL approach.
The innate system will be used to manipulated contemporary data types as well as enable sequential litigation. The latter is expected as a potent method in conducting use cases of time-series analysis as well as identification of gaps. As mentioned earlier, the architecture backs up a number of programming languages hence source to more capacity than SQL. The method used to augment and bolster the organisation's EDW through Hadoop clustering involves the following steps. Extending storage of succinct structured data from OLTP as well as behind office frameworks to the EDW. Secondly, storage of unstructured data in Hadoop/NoSQL since they barely fit into the tables in a nice way. Therefore, major communications with the client from call logs, client feedback, GPS placement, pictures, messages, tweets and emails will be laid in Hadoop. Lastly, correlating data in EDW to the Hadoop clustered data will seek to derive more adept perceptivity concerning clients, merchandise, and equipment. SliceDice will thus be able to execute ad-hoc analytics as well as bunching up, directing exemplars versus coordinated data in Hadoop; which is extremely intensive in computation.
Apache Hadoop was originally designed without a security model, however, its popularity has led to scrutiny among professionals. A number of threat were reported on Hadoop cluster where an attacker could devise code as a pretext of a Hadoop user. However, the recent growth of Hadoop security in the marketplace will see the SliceDice integration of vendor released enhancement using Intel's secure Hadoop distribution (Haber 2018). Similarly, Apache Accumulo will be used to provide techniques for appending security in Hadoop.
As in the .20.20x distribution security concerns for customer data will be accomplished through the following:
- Bolstering HDFS file licenses as an access control to files through the (ACL) of our clients and groups.
- Commission items for accompanying hallmarking checks will be used amid clients and services proceeding their first authentication, hence minimizing all potential overheads.
- Job items to bolster authorization of activities will be created using a JobTracker and drawn to TaskTrackers in the bid to ascertain that activities only perform on their assignments.
- Encoding network connections that use SASL with Quality of Protection (QoP) to ensure confidentiality at the network level. MapReduce, as well as web consoles, will be configured to use SSL.
- HDFS file transfer is configured for encryption.
Nevertheless, a number of challenges pose as serious security concerns for Hadoop as outlined by Boris and Alexey. First includes how to bolster authentication for clients using web consoles. Secondly is determining how to inhibit all rascal services in a pretext of a real Hadoop service. Thirdly, bolstering the access control to stored data in line with existent policies as well as client certificates. A fourth challenge faced by SliceDice involves the implementation of Attribute-Based Access Control (ABAC) otherwise known as Role-Based Access Control (RBAC) (Moura and Serrão 2018).
A HortonWorks developed approach will be used for data management policy in Apache Hadoop. The governance of data and functioning to make protection rapport meeting needs for the enterprise include assortment-based policy, placement-based policy, data expiration-based policy and inhibition-based policy (Hortonworks 2018). The company will use DataMeer for Apache Hadoop for visualizing trends of data according to various analysis. The software was chosen since it was designed for actively visualizing data, integrate as well as prepare numerous data types. Through the end to end system Hadoop is made simpler for the company's audiences. DataMeer contains a handy feature that makes it compatible with Hadoop as well as combining a self-service data incorporation tool.
References
Demchenko, Y., De Laat, C. and Membrey, P., 2014, May. Defining architecture components of the Big Data Ecosystem. In Collaboration Technologies and Systems (CTS), 2014 International Conference on (pp. 104-112). IEEE.
Hortonworks. (2018). Apache Hadoop Data Governance, Security and Operations | Hortonworks. [online] Available at: https://hortonworks.com/solutions/security-and-governance/ [Accessed 8 Oct. 2018].
Haber, M. (2018). The data governance story: How to develop policies & rules. [online] IBM Big Data & Analytics Hub. Available at: https://www.ibmbigdatahub.com/blog/data-governance-story-how-develop-policies-rules [Accessed 8 Oct. 2018].
Moura, J. and Serrão, C. (2018). Security and Privacy Issues of Big Data. [online] Arxiv.org. Available at: https://arxiv.org/ftp/arxiv/papers/1601/1601.06206.pdf [Accessed 8 Oct. 2018].
Smith, K. (2018). Big Data Security: The Evolution of Hadoop’s Security Model. [online] InfoQ. Available at: https://www.infoq.com/articles/HadoopSecurityModel [Accessed 8 Oct. 2018].
Sharif, A., Cooney, S., Gong, S. and Vitek, D., 2015, October. Current security threats and prevention measures relating to cloud services, Hadoop concurrent processing, and big data. In Big Data (Big Data), 2015 IEEE International Conference on(pp. 1865-1870). IEEE.
To export a reference to this article please select a referencing stye below:
My Assignment Help. (2021). Big Data Systems Enterprise Deployment, Integration, Scalability, And Security Issues Are Essay-worthy.. Retrieved from https://myassignmenthelp.com/free-samples/mis215-information-systems-analysis-and-design/big-data-systems-enterprise-deployment.html.
"Big Data Systems Enterprise Deployment, Integration, Scalability, And Security Issues Are Essay-worthy.." My Assignment Help, 2021, https://myassignmenthelp.com/free-samples/mis215-information-systems-analysis-and-design/big-data-systems-enterprise-deployment.html.
My Assignment Help (2021) Big Data Systems Enterprise Deployment, Integration, Scalability, And Security Issues Are Essay-worthy. [Online]. Available from: https://myassignmenthelp.com/free-samples/mis215-information-systems-analysis-and-design/big-data-systems-enterprise-deployment.html
[Accessed 19 August 2024].
My Assignment Help. 'Big Data Systems Enterprise Deployment, Integration, Scalability, And Security Issues Are Essay-worthy.' (My Assignment Help, 2021) <https://myassignmenthelp.com/free-samples/mis215-information-systems-analysis-and-design/big-data-systems-enterprise-deployment.html> accessed 19 August 2024.
My Assignment Help. Big Data Systems Enterprise Deployment, Integration, Scalability, And Security Issues Are Essay-worthy. [Internet]. My Assignment Help. 2021 [cited 19 August 2024]. Available from: https://myassignmenthelp.com/free-samples/mis215-information-systems-analysis-and-design/big-data-systems-enterprise-deployment.html.