Variable Description
BI Data Source Description and BI Requirements
In the current work, we chose to work with information related to historical sales. In particular, the data collected for the current work provides information on sales made by a given store in relation to whether there is a running promo or not. At the source, the data was stored in two different standard Comma Separated (CSV) files i.e., sales data: dataset containing historical sales, customer count, and other day-wise information and store data: supplemental information about the stores. Both datasets can be collected from https://www.kaggle.com/archis777/store-sales-historical-data. Table 1 below provides an overview of the descriptions of the different attributes in the two datasets.
Table 1: Variable description
Attribute |
Description |
Store |
A unique Id for each store |
Sales |
The turnover for any given day (this is what you are predicting) |
Customers |
The number of customers on a given day |
Open |
An indicator for whether the store was open 0 = closed, 1 = open |
StateHoliday |
Indicates a state holiday. Normally all stores, with few exceptions, are closed on state holidays. Note that all schools are closed on public holidays and weekends. A = public holiday, b = easter holiday, c = Christmas, 0 = none |
SchoolHoliday |
Indicates if the (Store, Date) was affected by the closure of public schools. 1- schools closed, 0 - schools open. |
StoreType |
Differentiates between 4 different store models a, b, c, d |
Assortment |
Describes an assortment level a = basic, b = extra, c = extended |
CompetitionDistance |
Distance in meters to the nearest competitor store |
CompetitionOpenSince[Month/Year] |
Gives the approximate year and month of the time the nearest competitor was opened |
Promo |
Indicates whether a store is running a promo on that day |
Promo2 |
Promo2 is a continuing and consecutive promotion for some stores (0 = store is not participating, 1 = store is participating) |
Promo2Since[Year/Week] |
Describes the year and calendar week when the store started participating in Promo2 |
PromoInterval |
Describes the consecutive intervals Promo2 is started, naming the months the promotion is started anew. E.g., "Feb, May, Aug, Nov" means each round starts in February, May, August, November of any given year for that store |
Tables 2 and 3 below show the top 15 observations from each of the datasets i.e., sales and store data respectively.
Table 2: Overview of the sales data
Table 3: Overview of the store data
Following an initial exploration of the data, some of the fundamental business processes that will be measured using the data include the effect of factors such as promo and holidays on the sales generated by the business. That is, how does the store perform when the store is running a promotion as well as how do sales differ on holidays and non-holiday seasons. Primarily, this will enable the determination of how the promotion run by the business is performing in terms of increasing the sales made by the underlying business.
Business Problems/Questions
In the current work, we seek to answer the following business questions:
- Which store model has the highest average sales?
- On average, do stores that are near their competitors have lower sales?
- What is the effect of running a promotion on sales recorded by a given store?
- Which assortment level corresponds to the highest average sales?
- Does having the store open lead to a variation in sales? i.e., on average, are there more sales made during the days when a given store is open compared to when it is closed?
- What is the relationship between the number of customers and the sales made by a given store?
- Do school holidays influence the number of customers visiting a given store?
- What is the average spending per customer?
- Which store type is most frequented by customers? Does the frequency of customers to such stores influence the overall sales made by a given store?
- What is the distribution of sales per week? Are there any days that have relatively high or low sales?
Key Users of The Data/Report
Since the data includes information related to sales made during an ongoing marketing campaign, one group of key users of the data and the resulting report are marketing analysts. Similarly, business owners would be interested in the data as a means of understanding the performance of their business including how various factors including holidays affect demand as well as whether marketing campaigns that are adopted by the underlying business are effective and if not, what is the best alternative. Other key users are the shareholders who would be interested in business performance and their current position as decision support tools for determining whether or not to continue investing in the corresponding business.
Need for this information
Often, understanding the historical business performance is a key determinant of the direction that will be adopted by the business. As such, the information that is proposed in the current report which is mainly inclined towards the provision of business performance understanding would function as a decision support system for business executives regarding among other aspects, the effectiveness of marketing campaigns and some of the underlying business policies such as opening during holidays etcetera. Besides, it is important to determine which factors affect business performance and to what extent. Hence the need for this information.
Data preprocessing whose aim was to make the data usable was conducted on each of the two datasets as defined in the following subsections.
Store DataOverview of Sales Data
At first glance, we note from table 3 above that the store data has some missing observations. As such our first approach involved selecting the relevant attributes from the store data. Based on the business problem defined above, the main attributes that will be required include Store (ID attribute for merging the two datasets), StoreType, Assortment, CompetitionDistance, and Promo2.
Overview of the original table with all attributes
Table 4
Table 5 below provides an overview of the selected variables from the store data.
Table 5: Selected attributes
For the select attributes, we replaced any erroneous observations with an empty entry which would be translated to missing observations after which we dropped the missing rows from the data. After confirming that the data types were matching the data entries and there were no missing observations in the data, we saved the data for further processes. Table 1 below shows the preprocessing steps that were conducted in preparing the store data.
Table 6: Query settings for the sales data
After a brief data exploration which involved examining the number of observations in the sales data, we noted that there were about 1,017,209 observations in the sales data indicating that there are over 1 million transactions that were conducted throughout the examination period. The sales data Table 7 below provides an overview of the sales data.
Table 7: Overview of the sales data
While there were no missing observations in the sales data, there were about 37,000 errors in the StatesHoliday attribute as shown in table 8.
Table 8: Overview of the column with errors
Since the business problems do not require the StatesHoliday, we dropped the column from the final data and remained with 8 variables as shown in table 9 below.
Table 9: Overview of the cleaned sales data
Merging Data
To obtain the complete data, we merged the two datasets using the store attributes as shown below.
Table 10
Using a left outer (all from first, matching from second) approach, we ended up with a match of 1, 014, 567 observations in the complete data. We removed the store.1 attribute which denotes the store column from the stores' data since it includes the same information as the store attribute from the sales data. Table 11 shows the queries used to preprocess the sales data as well as merge the two datasets.
Table 11
Whereas the data does not necessarily require schema modeling, we generated a few dimensions to demonstrate the application of the star schema to the current problem. To address our business objectives, we need information related to the following facts and dimensions:
- Sales information: facts
- Store type and assortment information: dimension
- Type of day information (this includes information on whether a day is a holiday and day of the week): dimension
- Any other store information including distance to competition – dimension
To this end, we subset the data to obtain the relevant information in different tables and establish their connection to the facts table as documented below.
Store type-Assortment
Table 12 below shows the levels of store types and assortment segments that we have generated.
Table 12
Figure 1 below shows the resulting star scheme for the business problem
Overview of Store Data
Figure 1: Star schema for the sales facts and dimensions
Task: Implementation of a Business Intelligence Solution
Often, business performance is hypothesized to affect the behavior of both managers and employees. According to (Van Looy & Shafagatova, 2016), measurement of business performance is a fundamental issue in both academia and business with organizations being required to continually achieve effective and efficient results. In practice, the application of performance evaluation metrics i.e., the measures by which a business evaluates its performance usually depends on the underlying business’s strategy which primarily implies that the choice of the performance evaluation metrics is often dependent on the business in question (Van Looy & Shafagatova, 2016). Ideally, Business process performance measurement, often abbreviated as the process performance measurement (PPM), is designed to aid the determination of the achievement of strategic and operational objectives as well as supporting decision-making for the ongoing optimization of business processes (Adela & Ruiz-Cortés, 2018).
In their study (Van Looy & Shafagatova, 2016) argue that the evaluation of business performance in particular in regards to internal business processes is conducted from various perspectives including financial and customer. (Van Looy & Shafagatova, 2016) on financial perspective note that they refer to “… sales or revenues gained while doing business, particularly after executing business processes.”
Over the years, the relationship between marketing and business performance has attracted considerable interest from academia and business alike (Morgan, 2012). According to (Morgan, 2012), both academia and managers have endeavored to demystify the effect of marketing on the performance of a business and in particular, how business performance differs across businesses as a result of the underlying marketing campaign.
Overall, the objective of the current work is to examine the performance of the business in question in relation to the specified questions i.e., given specified question how do the sales differ. To address the business objective the following questions will be answered:
- Which store model has the highest average sales?
- On average, do stores that are near their competitors have lower sales?
- What is the effect of running a promotion on sales recorded by a given store?
- Which assortment level corresponds to the highest average sales?
- Does having the store open lead to a variation in sales? i.e., on average, are there more sales made during the days when a given store is open compared to when it is closed?
- What is the relationship between the number of customers and the sales made by a given store?
- Do school holidays influence the number of customers visiting a given store?
- What is the average spending per customer?
- Which store type is most frequented by customers? Does the frequency of customers to such stores influence the overall sales made by a given store?
- What is the distribution of sales per week? Are there any days that have relatively high or low sales?
The following section includes the findings generated from the analysis aimed at answering the business questions specified above.
Which store model has the highest average sales?
Figure 1 below shows the average sales store type from where we note that on average, store a has the highest average sales
Figure 2: Average sales per store
On average, do stores near their competitors have lower sales?
To determine whether or not a store is near their competitor, we used the DAX formulae to define a new column such that, stores that are 300 meters to their closest competitors are considered very close, those that are above 300 but less than 1000 meters are considered close, those that are positioned at above 1000 meters but less than 5000 are considered average while those who are above 5000 meters but less than 10000 are considered far while those that are over 10000 meters from their competitors are considered very far.
The following formula was used:
Proximity = IF(calculate(sum(store_data[CompetitionDistance])) <= 300, "Very Close", IF(calculate(sum(store_data[CompetitionDistance])) <= 1000, "Close", IF(calculate(sum(store_data[CompetitionDistance])) <= 5000 , "Average", IF(calculate(sum(store_data[CompetitionDistance])) <= 10000 , "Far" ,IF(calculate(sum(store_data[CompetitionDistance])) > 10000 , "Very Far")))))
Business Questions to Be Answered
Figure 3 below shows the average sales per store distance.
Figure 3: Average sales per store proximity to competition
From figure 3 above we note that stores that are very close have more sales (24.24%) while those that are far from their competitors have the lowest average sales.
Effect of running a promotion on sales
Figure 4 below shows that on days when a given store was running a promotion, it made on average more sales compared to when the days when the store was not running a promotion.
Figure 4: Average sales per promotion type
Which assortment level corresponds to the highest average sales?
Figure 5
Stores that have extra assortment are noted to have the highest average sales (42.56%) compared to stores with basic and extended assortment.
Does having the store open lead to a variation in sales?
We observe from figure 6 below that closed stores recorded no sales indicating that closing stores have an effect on the store's overall sales.
Figure 6
What is the relationship between the number of customers and the sales made by a given store?
Examining the relationship between the number of customers and the number of sales made by a given store we note that an increase in the number of customers tends to correspond to an increase in the number of sales (see figure 7).
Figure 7: the relationship between sales and customers
Do school holidays influence the number of customers visiting a given store?
During school holidays, stores have a slightly higher average number of customers who visit the stores compared to non-school holidays indicating that school holidays have a slightly positive effect on the number of customers visiting the stores (see figure 8).
Figure 8: Average number of customers
What is the average spending per customer?
Figure 9 below provides an overview of the sales per transaction i.e., per customer.
Figure 9: Average sales
Which store type is most frequented by customers?
Figure 10
Overall, store b as noted from figure 10 above has the highest average number of customers as well as sales while store d has the lowest average number of customers as well as sales implying that the more the number of customers that visit a store, the higher the average sales a store records.
What is the distribution of sales per week? Are there any days that have relatively high or low sales?
From table 13 we note that in a week a store makes approximately 5773.83 in sales with most of the sales being made on Monday (day 1).
Table 13
Figures 11 and 12 below provide an overview of the two dashboards that include the visuals as discussed in the previous sections. Figure 11 corresponds to visuals related to questions 1 to 6 while figure 12 corresponds to the dashboard for visuals related to questions 7 to 10.
Figure 11: Dashboard 1
Figure 12: Dashboard 2
Based on our analysis findings, the following were noted:
- Running a promotion increases the average sales made by a business
- More customers are frequenting a store during school holidays
- Monday has the highest average sales
- An increase in the number of customers corresponds to an increase in the sales made
- Stores with extra assortment as well as stores operating with model bhave the highest average sales
- Proximity to competition has little to no effect on differences in sales
Conclusions and Recommendations
Business performance is often influenced by various factors ranging from internal factors such as strategic planning and management to external factors such as competition. For instance, in the current work, the financial performance of the stores is observed to be influenced by among other factors, having a running promotion i.e., stores that were running a promotion have on average more sales and school holidays which as we have established leads to more customers who in turn generate more revenue. However, it is interesting to note that competition had little effect on the sales made by a store. Moreover, stores that were closer to their competition had higher average sales compared to those that were far from their competitors indicating if anything, the competition helped improve business performance.
Building on the findings of the current work and the conclusion made above, the following brief recommendations are made regarding the improvement of business performance:
- Since there were no sales made when the stores were closed it implies that there were no online channels for businesses where customers can make purchases for later delivery when the stores are open. To this end, we propose that the business ought to consider expanding the business to have an online presence/ e-store where customers can make purchases for later delivery.
- More customers frequent the stores during school holidays indicating that a considerable market share of the business is either students or parents with students. As such, the business should adopt marketing campaigns targeting the specified market segment.
- The business should introduce shifts to enable the stores to run for longer hours. This will enable the business to generate more sales that would be lost during the period when the business is closed.
References
Adela, D.-R.-O. & Ruiz-Cortés, M. R. A., 2018. Business Process Performance Measurement. In: S. Sakr & A. Zomaya, eds. Encyclopedia of Big Data Technologies. s.l.:Springer, Cham.
Morgan, N., 2012. Marketing and business performance. J. of the Acad. Mark. Sci., 40(2012), pp. 102-11.
Van Looy, A. & Shafagatova, A., 2016. Business process performance measurement: a structured literature review of indicators, measures and metrics. SpringerPlus, 5(2016), p. 1796.
To export a reference to this article please select a referencing stye below:
My Assignment Help. (2022). Historical Sales Data Analysis: Understanding The Impact Of Promo And Holiday Season Essay.. Retrieved from https://myassignmenthelp.com/free-samples/cis4008-big-data-and-business-intelligence/business-intelligence-design-file-B1D48D6.html.
"Historical Sales Data Analysis: Understanding The Impact Of Promo And Holiday Season Essay.." My Assignment Help, 2022, https://myassignmenthelp.com/free-samples/cis4008-big-data-and-business-intelligence/business-intelligence-design-file-B1D48D6.html.
My Assignment Help (2022) Historical Sales Data Analysis: Understanding The Impact Of Promo And Holiday Season Essay. [Online]. Available from: https://myassignmenthelp.com/free-samples/cis4008-big-data-and-business-intelligence/business-intelligence-design-file-B1D48D6.html
[Accessed 19 August 2024].
My Assignment Help. 'Historical Sales Data Analysis: Understanding The Impact Of Promo And Holiday Season Essay.' (My Assignment Help, 2022) <https://myassignmenthelp.com/free-samples/cis4008-big-data-and-business-intelligence/business-intelligence-design-file-B1D48D6.html> accessed 19 August 2024.
My Assignment Help. Historical Sales Data Analysis: Understanding The Impact Of Promo And Holiday Season Essay. [Internet]. My Assignment Help. 2022 [cited 19 August 2024]. Available from: https://myassignmenthelp.com/free-samples/cis4008-big-data-and-business-intelligence/business-intelligence-design-file-B1D48D6.html.