1: Importing and Preparing Orders Table
4: Building the Customer Detail Page
5: Creating an Executive Summary Page
6: Creating a Product Detail Page
8: Cross Filtering and Navigation
9: Creating Metrics for Users Outside the Company Using SQL
Note: As I am a Mac user, I first had to create a Windows VM for this project. These are the steps I followed in order to do this:
- Access Azure
- Create a Windows Virtual Machine with the size D2s_v3. This VM costs ~$85/month but the Azure free trial offers a free $200 credit to spend on any Azure service. So as long as I finished the project within the propose timeline, I would not incur any additional charges.
- Connect my VM to my local machine. I did this by Utilising Microsoft Remote Desktop.
Below are three different dataframes that I needed to import and transform as part of my project. Each section reveals the steps I took to do this.
-
Connect to Azure SQL Database:
- Utilising the provided Azure SQL Database credentials, I connected to the database and imported the
orders_powerbi
table into Power BI using the Import option.
- Utilising the provided Azure SQL Database credentials, I connected to the database and imported the
-
Data Privacy and Integrity:
- In Power Query Editor,I deleted the
Card Number
column to ensure data privacy. - I then split the
Order Date
andShipping Date
columns into date and time components. - Next, I filtered out rows with missing or null values in the
Order Date
column to maintain data integrity. - Lastly, I renamed columns according to Power BI conventions for clarity and consistency.
- In Power Query Editor,I deleted the
-
Import Products.csv:
- I downloaded the
Products.csv
file and imported it into Power BI using the Get Data option. - I then used 'Remove Duplicates' on the
product_code
column to ensure uniqueness.
- I downloaded the
-
Clean and Transform Weight Column:
- Using 'Columns from Example' in the Power Query Editor, I separated the
Weight
column to create a new column calledWeightValue
with just the weight value. - Using the same method, I then separated the Weight Units from the
Weight
column, and put it into a new column calledWeightUnit
. - Any blank entries in the
WeightUnit
column, I replaced with "kg" - I then converted the
WeightValue
column to decimal; replace any errors with 1. - I then created a column called
WeightMultiplier
that contained split values from theWeight
column that had expressions such as2x200g
. This would allow me to correctly transformWeight
values that had mathematical expressions. Any values in the newWeightMultiplier
that were blank or null were replaced with 1, as they were already complete and did not need multiplication. - I then Created a new column called 'NewWeight' which contained the values of
WeightValue
multiplied by theWeightMultiplier
column. This ensured that all mathematical expressions were correctly converted to the right value. I used the following DAX to create this column:[NewWeight = 'Products'[WeightMultiplier] * 'Products'[WeightValue]]
- I then needed to standardise the weight values, so I converted all Weight Values into Kilograms by creating a new calculated column called 'KG_Weight', which used the following DAX to convert all non-Kg values into Kg values:
[KG_Weight = IF('Products'[WeightUnit] <> "kg", 'Products'[NewWeight] / 1000, 'Products'[NewWeight])]
- Using 'Columns from Example' in the Power Query Editor, I separated the
-
Data Cleanup:
- Lastly, I renamed all columns according to Power BI conventions for consistency and clarity.
-
Connect to Azure Blob Storage:
- Using Power BI's Get Data option, I connected to Azure Blob Storage and imported the
Stores
table by utilising the provided Blob Storage credentials.
- Using Power BI's Get Data option, I connected to Azure Blob Storage and imported the
-
Column Renaming:
- I renamed the columns in the dataset to align with Power BI conventions, ensuring clarity in the report.
-
Import Customers Folder:
- I downloaded and unziped the
Customers.zip
file. - Using the Get Data option, I imported the Customers folder into Power BI, combining and transforming the data.
- I downloaded and unziped the
-
Data Manipulation:
- Createx a
FullName
column by combining[First Name]
and[Last Name]
. - Deleted unnecessary columns and renamed the remaining ones according to Power BI conventions.
- Createx a
This milestone involved connecting to various data sources, importing tables into Power BI, and performing necessary data transformations to ensure data integrity, privacy, and clarity. The README provides detailed steps for each table, guiding you through the process of loading, cleaning, and structuring the data for effective reporting in Power BI.
-
Continuous Date Table: Created a continuous date table covering the entire time period of the data from Orders['Order Date'] to Orders['Shipping Date'] using DAX formula:
Dates = ADDCOLUMNS ( CALENDAR ( MIN ( Orders[OrderDate] ), MAX ( Orders[ShippingDate] ) ),
-
Date Table Columns: Added new Date columns using the following DAX formula:
-
I then Used the following DAX forumlas separately to create the following DateTime measures:
- Start of Year: `Start_of_Year = STARTOFYEAR('Dates'[Date]) ` - Start of Quarter: `Start_of_Quarter = STARTOFQUARTER(Dates[Date]) ` - Start of Month: `Start_of_Month = STARTOFMONTH(Dates[Date]) ` - Start of Week: `Start_of_Week = VAR WeekStartDate = [Date] - WEEKDAY([Date], 2) + 1 RETURN WeekStartDate`
-
Star Schema: Established a star schema by creating relationships between tables.
-
Table Relationships:
- Orders[ProductCode] to Products[ProductCode]
- Orders[StoreCode] to Stores[StoreCode]
- Orders[UserID] to Customers[UserUUID]
- Orders[OrderDate] to Date[Date]
- Orders[ShippingDate] to Date[Date]
-
Active Relationship: Ensured the relationship between Orders[Order Date] and Date[date] is an active relationship with a one-to-many relationship.
- Measures Table: Created a separate table for measures named "Measures_Table" in Power Query Editor to organise and manage measures effectively.
- Key Measures:
Created the following Key Meausres in the Measures_Table using the following DAX formulas:
- Total Orders: Count of orders in the Orders table.
Total_Orders = COUNTROWS(VALUES('Orders'[UserID]))
- Total Revenue: Sum of (Product Quantity * Sale_Price) for each order.
Total_Revenue = SUMX(Orders, Orders[ProductQty] * RELATED(Products[SalePrice]))
- Total Profit: Sum of (Product Quantity * (SalePrice - CostPrice)) for each order.
Total_Profit = SUMX(Orders, (RELATED(Products[SalePrice]) - RELATED(Products[CostPrice])) * Orders[ProductQty])
- Total Customers: Count of unique customers in the Orders table.
TotalCustomers = DISTINCTCOUNT(Orders[UserID])
- Total Quantity: Count of items sold in the Orders table.
Total_Quantity = SUM(Orders[ProductQty])
- Profit YTD: Total profit for the current year.
Profit_YTD = TOTALYTD(Measures_Table[Total Profit], Dates[Date])
- Revenue YTD: Total revenue for the current year.
Revenue YTD = TOTALYTD(Measures_Table[Total Revenue], Dates[Date])
- Total Orders: Count of orders in the Orders table.
-
Date Hierarchy: Created a date hierarchy with levels: Start of Year, Start of Quarter, Start of Month, Start of Week, Date for drill-down in line charts.
-
Geography Hierarchy: Created a geography hierarchy with levels: World Region, Country, Country Region for data filtering.
-
Country and Geography Columns: Added calculated columns in the Stores table for a full country name and geography based on specified schemes.
- Data Categories: Ensured specific columns (World Region, Country, Country Region) were categorised correctly for accurate mapping.
Model View: Please see the screenshot below for the up-to-date model view for this project.
- Creating report pages: Created an Executive Summary page, Customer Detail page, Product Detail page and Stores Map page.
- Selecting a colour theme: I selected a colour theme that I thought would look good as a finished report.
- Adding Navigation sidebars: added a rectangle shape covering a narrow strip on the left side of each page. This would be the sidebar that we will use to navigate between pages later in my report build.
-
Created two rectangles and arrange them in the top left corner of the page. These served as the backgrounds for the card visuals.
-
Added a card visual for the
[TotalCustomers]
measure I created earlier, renaming the field Unique Customers. I did this by selecting a card visual and dragging the[TotalCustomers]
measure into the card visual. I then formatted the card according to my report colour scheme. -
Created a new measure in my
Measures
Table called[RevenuePerCustomer]
. This was created by dividing[Total Revenue]
by the[Total Customers]
measures. -
Added a card visual for the
[RevenueperCustomer]
measure, by following the same steps I used for[TotalCustomers]
.
-
Added a Donut Chart visual showing the total customers for each country, by using the
[Users[Country]
column to filter the[Total Customers]
measure. -
Added a Column Chart visual showing the number of customers who purchased each product category, using the
Products[Category]
column to filter the[Total Customers]
measure.
-
Added a Line Chart visual to the top of the page, that showed
[Total Customers]
on the Y axis, and use the Date Hierarchy I created in Step 8 of Milestone 2 for the X axis. I have allowed users to drill down to the month level, but not to weeks or individual dates. -
Added a trend line, and a forecast for the next 10 periods with a 95% confidence interval.
- Created a new table, which displays the top 20 customers, filtered by revenue. The table shows each customer's full name, revenue, and number of orders.
- Added conditional formatting to the revenue column, to display data bars for the revenue values.
- Created a set of three card visuals that provide insights into the top customer by revenue. They display the top customer's name, the number of orders made by the customer, and the total revenue generated by the customer.
- Added a date slicer to allow users to filter the page by year, using the between slicer style. I did this by selecting the date slicer visualisation, and only adding Year to the field in Build a visual.
Finished Customer Details page:
- Copied a grouped card visual from the Customer Details page.
- Pasted it onto the Executive Summary page.
- Duplicated the card two more times.
- Arranged the three cards to span about half of the width of the page.
- Assigned them to
Total Revenue
,Total Orders
, andTotal Profit
measures.
- Used the Format > Callout Value pane:
- Ensure there were no more than 2 decimal places for Revenue and Profit cards.
- Set 1 decimal place for the Total Orders measure.
- Copied the line graph from the Customer Detail page.
- Set X-axis to
Date Hierarchy
withStart of Year
,Start of Quarter
, andStart of Month levels
. - Set Y-axis to Total Revenue.
- Positioned the line chart just below the cards.
- Added two donut charts:
- Copied the
Total Customers
byProduct Category
column chart. - In the on-object Build a visual pane, I changed the visual type to Clustered bar chart.
- Changed the X-axis field from
Total Customers
toTotal Orders
. - Altered the colour for my theme in the Colours tab.
- Used the following DAX to create the following measures:
- Previous Quarter Profit.
Previous Quarter Profit = CALCULATE( [Total Profit], DATEADD('Dates'[Start_of_Quarter], -1, QUARTER) )
- Previous Quarter Revenue.
Previous Quarter Revenue = CALCULATE( [Total Revenue], DATEADD('Dates'[Start_of_Quarter], -1, QUARTER) )
- Previous Quarter Orders.
Previous Quarter Orders = CALCULATE( [Total Orders], DATEADD('Dates'[Start_of_Quarter], -1, QUARTER) )
- Targets (5% growth in each measure compared to the previous quarter).
Target Profit = [Previous Quarter Profit] * 1.05
Target Revenue = [Previous Quarter Revenue] * 1.05
-Target Orders = [Previous Quarter Orders] * 1.05
- Added a new KPI for Total Revenue, Total Profit and Total Orders:
- Value field: Total Revenue.
- Trend Axis: Start of Quarter.
- Target: Target Revenue.
- In the Format pane:
- Set Trend Axis to On.
- Set Direction to High is Good.
- Set Bad Colour to red.
- Set Transparency to 15%.
-
Formatted the Callout Value to show only 1 decimal place.
-
Duplicated the card two more times.
-
Set appropriate values for the Profit and Orders cards.
Finished Executive Summary Page:
- Added three gauges for Orders, Revenue, and Profit.
- Defined DAX measures for metrics and quarterly targets:
10% Target Quarter Orders = 'Measures_Table'[Previous Quarter Orders] * 1.1
- Current Quarter Orders = CALCULATE( TOTALQTD('Measures_Table'[Total Orders],'Dates'[Date]) )`
- Set maximum gauge values to quarterly targets.
- Applied conditional formatting to callout values so that it remains red until the target is reached.
- Arranged gauges evenly along the top of the report.
- Added rectangle shapes for card visuals.
- Used a color in keeping with the theme.
- Added an area chart for revenue over time.
- Configured X-axis to
Dates[Start of Quarter]
. - Y-axis values to
Total Revenue
. - Legend to
Products[Category]
.
- Copied the top customer table from
Customer Detail
page. - Included fields:
Product Description
,Total Revenue
,Total Customers
,Total Orders
,Profit per Order
.
- Created a calculated column
[Profit per Item]
by using the following DAX formula:Profit per Item = SUMX(Products,Products[SalePrice] - Products[CostPrice])
- Added a scatter chart with X-axis as
[Profit per Item]
and Y-axis as[Total Quantity]
. - Set Legend to
Products[Category]
.
- Downloaded custom icons collection.
- Added a custom icon button to the navigation bar.
- Created a rectangle shape for slicer panel.
- Add two vertical list slicers:
Products[Category]
andStores[Country]
. - Configured slicers for neat formatting.
- Grouped slicers with the slicer toolbar shape.
- Added a Back button and positioned it sensibly.
- Created bookmarks for open and closed states of the toolbar.
- Assigned actions to buttons using bookmarks.
Finished Product Detail page:
- On the Stores Map page, I added a new map visual from the visualisations section.
- Set the style in the Format pane to my satisfaction and ensured Show Labels is set to On.
- Map Controls:
- Auto-Zoom: On
- Zoom buttons: Off
- Lasso button: Off
- Assigned Geography hierarchy to the Location field, and ProfitYTD to the Bubble size field.
- Added a slicer above the map.
- Set the slicer field to
Stores[Country]
. - Formatted the slicer:
- Slicer style as Tile.
- Selection settings to Multi-select with Ctrl/Cmd.
- Show "Select All" as an option in the slicer.
Finished Stores Map page:
- Created a new page named
Stores Drillthrough
. - Opened the format pane and expanded the Page information tab.
- Set the Page type to Drillthrough.
- Set Drill through when to Used as category.
- Set Drill through from to country region.
- Added the following visuals to the drillthrough page:
- A table showing the top 5 products with columns: Description, Profit YTD, Total Orders, Total Revenue.
- A column chart showing Total Orders by product category for the store.
- Gauges for Profit YTD against a profit target of 20% year-on-year growth vs. the same period in the previous year, using the Target field, not the Maximum Value field.
- A Card visual showing the currently selected store.
Finished Drillthrough page:
- Created a new page named Store's Tooltip.
- Copied over the Profit Gauge visual from the drillthrough page.
- Set the tooltip of the visual to the Store's Tooltip page.
- Added two visual cards that showed the country and the region selected.
- Made sure the size of the tooltip page fit the tooltip, to make sure that it didn't take up too much room on the Stores Map page. I did this by entering a custom size on the 'canvas settings' section on the page's format pane.
Finished Tooltip page:
Finished Tooltip on Map Page:
To do this, I clicked on the 'Edit Interactions' button for the visualisation I wanted to cross-filter. I then selected the filter icon and selected the correct type of cross filtering needed for the other visualisations.
For example, in this cross filtering for the Total Customers by Product Bar Chart on the Customer Details page, I selected the small icon so that there would be no impact on the Customers Line Graph.
Here is the cross-filtering I did for this report's pages:
- Product Category bar chart and Top 10 Products table not filtering the card visuals or KPIs.
- Top 20 Customers table not filtering any of the other visuals.
- Total Customers by Product Category Bar Chart not affecting the Customers line graph.
- Total Customers by Country Donut chart cross-filter Total Customers by Product Donut Chart.
- Orders vs. Profitability scatter graph not affecting any other visuals.
- Top 10 Products table not affectinf any other visuals.
For each page, there was a custom icon available in the custom icons collection: 7. Using the white version for the default button appearance. 8. setting the blue version for the button when hovered over with the mouse pointer.
- Added four new blank buttons.
- In the Format > Button Style pane, set Apply settings to Default.
- Set each button icon to the relevant white png in the Icon tab.
- For each button, set Apply settings to On Hover, and selected the alternative colourway of the relevant button under the Icon tab.
- Turned on the Action format option for each button.
- Selected the type as Page navigation.
- Selected the correct page under Destination. For Example:
- Grouped the buttons together.
- Copied the group across to the other pages.
Finished Navigation Bar:
To connect to the PostgreSQL database and run queries from Visual Studio Code, I followed these steps:
- Opened Visual Studio Code.
- Went to the Extensions view by clicking on the Extensions icon in the Activity Bar on the side of the window or use the shortcut
Ctrl+Shift+X
. - Searched for "SQLTools" in the Extensions view search bar.
- Found the SQLTools extension and clicked the "Install" button.
- Once the SQLTools extension was installed, I clicked on the SQLTools icon in the Activity Bar.
- In the SQLTools view, I clicked on the "Connections" tab.
- Clicked on the "Add Connection" button.
Filled in the connection details using information provided:
- Name: Gave my connection a name (e.g., Azure Postgres).
- Database Type: NA
- Server: NA
- Port: NA
- Database: NA
- User: NA
- Password: NA
After entering the connection details, I clicked the "Test Connection" button to ensure that the connection was successful. Now that I had successfully connected to the PostgreSQL database, I could open a new SQL file in Visual Studio Code, select my connection from the dropdown, start writing and executing SQL queries.
- To understand the structure of the database, I started by printing a list of all tables and saving the result to a CSV file.
- I did this by using the following SQL Query:
SELECT table_name FROM information_schema.tables WHERE table_schema = 'public' AND table_type = 'BASE TABLE' ORDER BY table_name; ```
- I did this by using the following SQL Query:
Execution Steps:
- I executed the SQL query against my database.
- Saved the result locally to a CSV file named
table_names.csv
for reference.
- Next, I explored the columns in the
orders
table and saved the result to a CSV file calledorders_columns.csv
.- I did this by using the following SQL Query:
SELECT column_name FROM information_schema.columns WHERE table_name = 'orders';
- I did this by using the following SQL Query:
Execution Steps:
- Executed the SQL query against my database.
- Saved the result to a CSV file named
orders_columns.csv
for reference.
- For each table in the database, I then repeated the process of printing a list of columns and saving the results to CSV files with corresponding names.
Example SQL Query:
-- Replacing 'your_table_name' with the actual table name
SELECT column_name
FROM information_schema.columns
WHERE table_name = 'your_table_name';
Execution Steps:
- Executed the SQL query against your database for each table.
- Saved the results to CSV files with names matching the respective table names (e.g.,
your_table_name_columns.csv
).
This process provided me with a comprehensive overview of the tables and columns in my database, aiding my exploration and analysis.
Please follow these next steps and the SQL queries I used to answer the following questions:
-- SQL Query for Question 1
SELECT
country,
SUM(staff_numbers) AS total_staff
FROM
dim_store
WHERE
country = 'UK'
GROUP BY
country;
Execution Steps:
- Executed the SQL query against my database.
- Exported the result to a CSV file named:
- Saved the SQL query to a file named:
-- SQL Query for Question 2
SELECT
month_name,
ROUND(SUM(sale_price * product_quantity)::numeric, 2) AS revenue_total
FROM
forquerying2
WHERE
EXTRACT(YEAR FROM dates::timestamp) = 2022
GROUP BY
month_name
ORDER BY
revenue_total DESC
LIMIT
1;
Execution Steps:
- Executed the SQL query against your database.
- Exported the result to a CSV file named:
- Saved the SQL query to a file named:
-- SQL Query for Question 3
SELECT
store_type,
country,
ROUND(SUM(sale_price * product_quantity)::numeric, 2) AS revenue
FROM
forquerying2
WHERE
EXTRACT(YEAR FROM dates::timestamp) = 2022
AND country = 'Germany'
GROUP BY
store_type, country
ORDER BY
revenue DESC
LIMIT
1;
Execution Steps:
- Executed the SQL query against my database.
- Exported the result to a CSV file named:
- Saved the SQL query to a file named:
Question 4: Create a view with store types as rows and columns as total sales, percentage of total sales, and the count of orders
-- SQL Query for Question 4
CREATE VIEW question_4 AS
SELECT
store_type,
ROUND(CAST(SUM(sale_price * product_quantity) AS numeric), 2) AS total_sales,
ROUND(CAST(SUM(sale_price * product_quantity) / SUM(SUM(sale_price * product_quantity)) OVER () * 100 AS numeric), 2) AS sales_percentage,
COUNT(order_date) AS orders
FROM
forview
GROUP BY
store_type;
Execution Steps:
-
Executed the SQL query against my database.
Question 5: Which product category generated the most profit for the "Wiltshire, UK" region in 2021?
-- SQL Query for Question 5
SELECT
category,
full_region AS region,
SUM(sale_price * product_quantity) AS total_revenue,
ROUND(SUM(cost_price * product_quantity)::numeric, 2) AS total_cost,
ROUND(SUM((sale_price * product_quantity) - (cost_price * product_quantity))::numeric, 2) AS total_profit
FROM
forquerying2
WHERE
EXTRACT(YEAR FROM dates::timestamp) = 2021 AND full_region = 'Wiltshire, UK'
GROUP BY
category,
full_region
ORDER BY
total_profit DESC
LIMIT
1;
Execution Steps:
- Executed the SQL query against my database.
- Exported the result to a CSV file named:
- Saved the SQL query to a file named:
And finally...
Through this project, I embarked on a comprehensive journey in data analytics using Power BI and SQL tools. From importing and transforming data tables to creating a robust data model, each milestone brought unique challenges and learning opportunities. Navigating through various SQL queries enhanced my understanding of database exploration and querying, particularly in the context of real-world scenarios like calculating staff numbers across UK stores. The creation of interactive and visually appealing Power BI reports, complete with cross-filtering mechanisms and navigation bars, further solidified my grasp of data visualisation techniques. This project has equipped me with valuable skills in data manipulation, model creation, and insightful report design, forming a solid foundation for future endeavors in data analytics. I extend my gratitude to the readers for engaging with this journey, and I hope the insights shared prove beneficial in their own data exploration and analysis endeavors!