Screen capture of final EDA process
Conduct an exploratory data analysis (EDA) of the housing.csv data set using the RapidMiner Studio data mining tool. Provide the following:
(i) a screen capture of your final EDA process, briefly describe your EDA process
(ii) summarise key results of your exploratory data analysis in Table 2.1 Results of Exploratory Data Analysis for housing.csv.
(iii) Discuss the key results of exploratory data analysis presented in Table 2.1 andprovide a rationale for selecting top 5 variables for predicting house values and in particular their relationship with house values drawing on the results of EDA analysis and relevant literature (About 250 words).
Should include the key characteristics of each variable in the housing.csv data set such as maximum, minimum values, average, standard deviation, most frequent values (mode), missing values and invalid values etc.
The Statistics Tab and the Chart Tab in RapidMiner Studio provide a lot ofdescriptive statistical information and the ability to create useful charts like Barcharts, Scatterplots etc for the EDA analysis. You might also like to look at running some correlations and/or chi square tests as appropriate for the housing.csv data set to determine which variables contribute most to predicting house values.
Build a Linear Regression model for predicting house value using a RapidMiner data mining process and an appropriate set of data mining operators and a reduced set of variables from the housing.csv data set as determined by your exploratory data analysis in
(i) A screen capture of Final Linear Regression Model process and briefly describe your Final Linear Regression Model process
(ii) A table named Table 2.2 named Results of Final Linear Regression Model for Task 2.2 for housing.csv data set.
(iii) Discuss the results of the Final Linear Regression Model for housing.csv data set drawing on the key outputs (coefficients, standardised coefficients, t-statistics values, p-values and significance levels etc) for predicting house values and relevant supporting literature on the interpretation of a Linear Regression Model
Create a Geo map Graph view that displays house values and other relevant data using housing.csv. Comment on the (1) process of preparing a Geomap Graph view using Tableau Desktop and (2) key trends and patterns that are apparent in the Geomap Graph created for visual presentation of the housing.csv data set. Note this housing values data set is drawn from State of California USA so you need to modify the Map Menu Edit Locations options in Tableau so you can plot latitude and long
ANS: Organizational culture is a phenomenon, sharing somefundamental assumptions of internal and external integration acquired through team work, explaining precise ways of perception and relation to new members. This forms an important part of the institutional ethos. It nurtures the synergistic quality and leadership norms, which eventually indicates the stability of the institution. A significant umbrella approach of organizational culture is to enhance knowledge about the variation and richness of the institution. Values of social reality and dynamic elements of individuals are vital ingredients of the institutional philosophy.The unique and unitary aspect of the organization characterized by a stable set of meaning and observed as mini-societies with mainstream workforce, which accentuates the pluralism of institutions (Linnenluecke, and Griffiths, 2010).
Decision-making procedure depends on assumptions which are guided bythe aspirations associated withthe decision itself. Assumptions and context are external aspects, but the knowledge of organization depends on data of the system. Heterogeneous data are not consistent and comparable, and to make a decision, precise information about the entity under the analysis is important. Identifying limits of the data in details for quantification of each associated attribute is important for detailed knowledge.Hence, characterization of data for abstract and physical properties is useful for detecting and preventing erroneous results. Instead of mere intuition, data-driven decision making supports decisions based on the analysis of the data. For this reason, monitoring the data collection, processing, and preservation are important perspectives associated with data quality. The data replication utilizes the organizational memory along with case-based reasoning for organizational knowledge in the decision-making procedure(Brynjolfsson, Hitt and Kim, 2011).
ANS: Firms’ management makes effective decisions relying on data-based analytics, but not on management’s instinct. Organizations gather and propagate detailed knowledge for the suppliers, consumers, and partners. One of the major reasons for this trend is pervasive organization culture, which enthralls the system to collect and process data as an important aspect of regular operations (Robbins, and Judge, 2012.). Correspondingly, generation of a plethora of data of customer behavior resorts to sustainability of the organization with detailed analysis of customer behaviors.The vision of the organizations on information-processing suggests utilization of precise information for facilitating accurate decision making and higher firm performance. The growing evidences of positive impacts of culture driven data analysis are indeed true in specific situations.(Zheng, Yang, and McLean, 2010)
The extraction of information from business data is an important aspect in decision making, but very few companies have been able to seize this information advantage. A majority of the misguided organizations are deficient in required technical capabilities to utilize the data, but also have failed to build an organizational culture that contributes to growth. Irrespective of the number of data scientists in the organization, effective use of datarequires business-aligned culture that supports the way people access and work with it. There should be a company culture which believes that each one of the business decisions should rely on data.Properties characterizing the analytics driven corporate culture rely upon the feelings and suppositions of the organization that it grasps for enhancements recommended by information (Popovi? et al., 2012). The representatives can oversee information freely and information examination is an aspect of everybody's responsibilities, where admittance to the information is accessible to each worker. Lamentably, regularly investigation is utilized to help the organization's current status and conventional choices as opposed to controlling it toward advancement. In this manner, the issue of actualizing an examination culture isn't generally in an organization's specialized domain, yet rather in its human-asset domain, the people who work for the organization. The progressions that accompany another way to deal with working together should be tended to as a matter of first importance from the worker side.
Key results of exploratory data analysis in Table 2.1 Results of Exploratory Data Analysis for housing.csv
Difficulties of utilizingenormous senior administrative information aresignificantlyunpretentious, and more prominent. A standout amongst the most elementary parts of information driven basic leadership is reflected in effect of choices made, and on the people implementing them. At the point when information is costly, rare, and not easily accessible in electronicprofile, it promises well for each individual to decide the course of involvement and connections. "Instinct" is the attribute to this style of derivation and basic leadership. Individuals in the organization express their sentiments about what's on the horizon and what will happen, or even how efficiently something will work, and after that arrangement are done accordingly. Especially for these vital choices, the individuals rise to the occasion in the association, and their skill and track recordsstates it (McAfee et al., 2012).
Using the current house price data set, Exploratory Data Analysis (EDA) was executed in the Rapid-Miner software platform. In the EDA process, the data file was retrieved using the Retrieve operator and connected with the Select Attribute operator. The process was linked to ‘Replace Missing Value’ operator to exclude missing values in the EDA process. Variable Ocean Proximity was excluded from the analysis as descriptive for this nominal variable was redundant. The Replace Missing Value operator was connected with the result point of the process window for descriptive values
(ii)
Exploratory Data Analysis of the Attributes
(iii) Average median house prices (M = 206855820 $, SD = 115395620 $) was much higher than the median (ME = 179700000 $) of the data, indicated right skewness of the distribution of house prices. Mode of the distribution was the maximum value of the data, which signified the existence of a rich group of buyers who bought costlier houses at prime locations. Considering the income aspect, average income (M = 38700 $) was greater than median income, reflecting the right skewness of income distribution. Houses were not too old with average life as 28.64 years, and the data was almost normal in nature. Ocean Proximity was an important house price predictor. Figure 3 pointed to the average price range of the houses with respect to geographical proximity to the ocean. Figure 4 indicated the correlation between income and house prices, where a positive relation was identified. Income was believed to be an important predictor of house price. The confirmatory check was done using Pearson’s correlation and the results are in Table 2.1.1. Latitude and longitude was geographical data and were excluded from the variable list of predicting factors for house prices. Five variables, median income, total rooms, house median age, ocean proximity, and a number of households were identified as the probable impact factors from the correlation analysis. Real estate of price was estimated using similar factors by Chung Chun Lin in 2013, whereas in Malaysia, ShiauHuiKok (2018) analyzed the predicting factors for the real estate price within a timeframe of 2002 to 2015. Chi- Square goodness of fit was conducted with ocean proximity, and the house value category. House value category was a nominal attribute with three levels, created based on quartile values. Results have been presented in Table .
Key results of top 5 variables for predicting house values and their relationship with house values
The linear regression model with five predicting attributes was designed to predictthe impact on the median price of house A significant relationship was detected between the average median house price and the five impact factors. The regression process was constructed as in figure 4. Nominal to Numerical filter was used to code the levels of ocean proximity attribute. Missing values were dealt with Replace Missing Value operator.
Table 2.2: Results of Final Linear Regression Model for Task 2.2 for housing.csv Data Set
- The final regression model was found as follows,
Median House Price = - 26823.72 -8307.02 * Ocean Proximity + 1752.40 * House Median Age – 16.99 * Total Rooms + 123.07 * Households + 46233.91 * Median Income.
The standardized regression model was evaluated without intercept as,
Median House Price = - 0.06 * Ocean Proximity + 0.19 * House Median Age – 0.32 * Total Rooms + 0.41 * Households + 0.76 * Median Income.
Ocean Proximity was a significant predictor (t = - 12.46, p < 0.05), where properties near ocean or coastlines were costly. Median Income was the most important positive significant factor (t = 139.54, p < 0.05) for house prices. In Malaysian context, house price with adequate social facilities were very popular among buyers (Mohit, Ibrahim, & Rashid, 2010). Multiple regression analysis was used by the scholar, which was in line with earlier research the work. Total Rooms was a significant predictor (t = - 23.17, p < 0.05), but negative coefficient indicated that expensive house properties did not have excessive rooms. Number of households in the area was an important factor (t = 30.39, p < 0.05) in assessing house prices. Areas with fewer numbers of households were less costly. Negative intercept indicated sharp decline in house prices in absence of the five predictors. Age of the house was positive and significant impact factor (t = 36.75, p < 0.05) for house prices. A house with numerous rooms is likely to have high restraintcompared to a cottage with a single bedroom. House price in island zone will definitely cost more than urban region. Furthermore, an area with overwhelmingbeauty and scenic pleasure adds to the additional house estimation (Iacoviello, and Neri, 2010).
Text Table for House Attributes with ocean Proximity
(1) Tableau desktop platform was utilized to create the text table view of the data. The housing.csv data set was linked to the tableau application. A new sheet for text table was constructed. Ocean Proximity was placed in the rows with other factors in the columns. Average measure for the variables in columns was used.
(2) Average house prices were the maximum in islands, followed by ocean proximity areas. The prices were the minimum in inlands. It was also noted that the average age of houses were greater in islands, whereas comparatively, new houses were there in inlands. Because of high living cost, the average population was less in islands. People preferred places within one hour of drive from ocean coastlines. The average population was the maximum near inland areas, with the average population score ranging from 1354 to 1520. Average total bedrooms were less in exclusive properties in islands, whereas the maximum number of average (M = 547) bedrooms were within one hour of ocean. Trend of average rooms was in line with the trend of average bedrooms.
Average House Value with Ocean Proximity
- Geo-map Graph view for house price with ocean proximity and median age was constructed in Desktop tableau 10.5 by usingthe average latitude and longitude from measures. Zip codes of the places were not provided, hence, longitude and latitude were plotted against columns and rows of a map of the world. The location of the graph was automatically detected by the software. Figure8 represents the geo map for average median house prices in State of California USA. The latitudes and longitudes were set in geographical actual latitude and longitudes.
- The Geo-map details were in line with the earlier analysis. The ‘ocean proximity’ attribute details have been represented by use of five colors, which can be identified from Figure 8. Escalated house price in islands and vicinity of ocean areas was observed. Affinity of houses in serene places with proper ambience and beautification was observed.
Reference Lists
Brynjolfsson, E., Hitt, L. and Kim, H. (2011). Strength in Numbers: How Does Data-Driven Decision making Affect Firm Performance?.SSRN Electronic Journal.
Iacoviello, M. and Neri, S., 2010. Housing market spillovers: evidence from an estimated DSGE model. American Economic Journal: Macroeconomics, 2(2), pp.125-64.
Kok, S.H., Ismail, N.W. and Lee, C., 2018. The sources of house price changes in Malaysia. International Journal of Housing Markets and Analysis, 11(2), pp.335-355.
Linnenluecke, M.K. and Griffiths, A., 2010. Corporate sustainability and organizational culture. Journal of world business, 45(4), pp.357-366.
McAfee, A., Brynjolfsson, E., Davenport, T.H., Patil, D.J. and Barton, D., 2012. Big data: the management revolution. Harvard business review, 90(10), pp.60-68.
Popovi?, A., Hackney, R., Coelho, P.S. and Jakli?, J., 2012. Towards business intelligence systems success: Effects of maturity and culture on analytical decision making. Decision Support Systems, 54(1), pp.729-739.
Robbins, S.P. and Judge, T., 2012.Essentials of organizational behavior.
Zheng, W., Yang, B. and McLean, G.N., 2010.Linking organizational culture, structure, strategy, and organizational effectiveness: Mediating role of knowledge management. Journal of Business research, 63(7), pp.763-771.
To export a reference to this article please select a referencing stye below:
My Assignment Help. (2021). Exploratory Data Analysis And Linear Regression Essay For Housing.csv.. Retrieved from https://myassignmenthelp.com/free-samples/cis8008-business-intelligence/corporate-sustainability-and-organizational.html.
"Exploratory Data Analysis And Linear Regression Essay For Housing.csv.." My Assignment Help, 2021, https://myassignmenthelp.com/free-samples/cis8008-business-intelligence/corporate-sustainability-and-organizational.html.
My Assignment Help (2021) Exploratory Data Analysis And Linear Regression Essay For Housing.csv. [Online]. Available from: https://myassignmenthelp.com/free-samples/cis8008-business-intelligence/corporate-sustainability-and-organizational.html
[Accessed 19 August 2024].
My Assignment Help. 'Exploratory Data Analysis And Linear Regression Essay For Housing.csv.' (My Assignment Help, 2021) <https://myassignmenthelp.com/free-samples/cis8008-business-intelligence/corporate-sustainability-and-organizational.html> accessed 19 August 2024.
My Assignment Help. Exploratory Data Analysis And Linear Regression Essay For Housing.csv. [Internet]. My Assignment Help. 2021 [cited 19 August 2024]. Available from: https://myassignmenthelp.com/free-samples/cis8008-business-intelligence/corporate-sustainability-and-organizational.html.