RevoU Data Visualization Assignment
Project Summary
- Conducted Exploratory Data Analysis on 2 datasets for each intermediate and advanced assignment. Found interesting insight and told the story as concise as possible.
- Visualized data for each findings using best data visualization practice to make sure data presented was easy to interpret by common people.
- Created interactive dashboards for each intermediate and advanced assignment and make sure the data presented on the dashboard was right on target and usable for stakeholders.
- Created presentation deck to compile the work done.
Insights
Intermediate- From October 2016 – August 2018, the number of orders experienced an upward trend. The number of orders reached its peak in March 2018 with 347 orders.
- Of the product categories that are in the top 10 based on the number of orders, Bed Bath Table ranks first with 452 orders, followed by Health Beauty in second place with 400 orders, and Sports Leisure in third place with 359 orders.
- Of the customer states that are in the top 5 rankings based on the number of customers, SP (Sao Paolo) ranks first with 1,959 customers, followed by RJ (Rio de Janeiro) in second with 639 customers, and MG (Minas Gerais) in third with 523 customer.
- Segmenting is carried out on the Payment Value from the Order Payments Dataset. Price group is divided into 4, 0-50 (low), 51-100 (medium), 101-1.000 (high), and >= 1001 (very high). The price group with the highest demand is the high price group with a price range of 101-1.000 which has a total order of 2.385.
- Price data that has gone through the binning process is visualized, a histogram is made and it can be seen that the data is not evenly distributed, the histogram shows a positive skewness, and the most frequent price data are in the price range 40-80.
- price data that has gone through the segmentation process (calculated field) is visualized, it can be seen that from the data, the most frequent price category is medium (101-1.000), thinly behind is low (0 – 100), and last is high (>= 1.001)
- Central Region is the neighborhood group with the most total listings and Private Room is the room type with the most total listings.
- The most frequent room type reviewed is entire home/apt, followed by private room, hotel room, and last is shared room.
- The most expensive neighborhood is Tuas and overall most neighborhood located in Central Region.
- Neighborhood with highest number of reviews is Geylang with 8.600 reviews, followed by Kallang with 6.200 reviews, and followed closely by Rochor with 6.100 reviews.
- Neighborhood with highest average price is Tuas with average price of $10,29K SGD, followed by Southern Island with average price of $1,59K SGD, and then Orchard with average price of $570,17 SGD.
- The most expensive listing is Corner Terrace located in Bedok with minimum price of $1.825.000 SGD.
Project Files
For a more comprehensive analysis and visualization, please open the project files.Project Background
Data visualization is useful for data cleaning, exploring data structure, detecting outliers and unusual groups, identifying trends and clusters, spotting local patterns, evaluating modeling output, and presenting results. It is essential for exploratory data analysis and data mining to check data quality and to help analysts become familiar with the structure and features of the data before them. To test our capability in data visualization, we need to visualize some data in the intermediate and advanced assignment. I am going to use tableau public as a main data visualization tools. Other than using tableau, I also use Google Data Studio.
Data Scope, Goals & Objectives
For intermediate assignment we used data from kaggle and provided by olist, brazilian e-commerce company Customers Dataset. The dataset has information of 100k orders from 2016 to 2018 made at multiple marketplaces in Brazil. Its features allows viewing an order from multiple dimensions: from order status, price, payment and freight performance to customer location, product attributes and finally reviews written by customers. We also released a geolocation dataset that relates Brazilian zip codes to lat/lng coordinates.
For advanced assignment we used data from kaggle provided by Singapore AirBnB listing Listing. The data was collected on 28 August 2019 according to the website. There is 7907 sample, but there is some missing data on some feature/variable.
Goals
Our goal is to learn how to visualize data with consice meaning, using tableau and google data studio. We also learn how to create beautiful and interactive dashboards.
Objectives
- Exploratory data analysis of dataset given and visualizing the findings with appropriate chart and best practice to make the data storytelling as consice as possible.
- Combining the visualization in dashboards. Solve client problem using dashboards to track their progress or KPI.
Data Analysis
Intermediate Assignment
From October 2016 – August 2018, the number of orders experienced an upward trend. The number of orders reached its peak in March 2018 with 347 orders.
Of the product categories that are in the top 10 based on the number of orders, Bed Bath Table ranks first with 452 orders, followed by Health Beauty in second place with 400 orders, and Sports Leisure in third place with 359 orders.
Of the customer states that are in the top 5 rankings based on the number of customers, SP (Sao Paolo) ranks first with 1,959 customers, followed by RJ (Rio de Janeiro) in second with 639 customers, and MG (Minas Gerais) in third with 523 customer.
In this visualization, sorting is used with a custom index at the deepest level. Sorting will order the top 3 Customer City and will repeat in each Customer State. Sorting is done based on the count of the unique Customer ID.
Segmenting is carried out on the Payment Value from the Order Payments Dataset. Price group is divided into 4, 0-50 (low), 51-100 (medium), 101-1.000 (high), and >= 1001 (very high). The price group with the highest demand is the high price group with a price range of 101-1.000 which has a total order of 2.385.
Advanced Assignment
In this visualization, price data that has gone through the binning process is visualized, a histogram is made and it can be seen that the data is not evenly distributed, the histogram shows a positive skewness, and the most frequent price data are in the price range 40-80.
It can be seen that from the neighborhood group visualization of the total listings divided by price category, Central Region is the neighborhood group with the most total listings.
It can be seen that from the room type visualization of the total listings divided by price category, Private Room is the room type with the most total listings.
Based on their respective number of review last 12 Months. The most frequent room type reviewed is entire home/apt, followed by private room, hotel room, and last is shared room.
In this visualization, I plot the neighborhood based on longitude and latitude data. I choose to add coloring and size difference to provide more information in the visualization. The darker the color the more expensive the neighborhood is. The bigger the size, the more frequent neighborhood listed. The most expensive neighborhood is Tuas and overall most neighborhood located in Central Region.
From the chart, we can see that neighborhood with highest number of reviews is Geylang with 8.600 reviews, followed by Kallang with 6.200 reviews, and followed closely by Rochor with 6.100 reviews.
From the chart, we can see that neighborhood with highest average price is Tuas with average price of $10,29K SGD, followed by Southern Island with average price of $1,59K SGD, and then Orchard with average price of $570,17 SGD.
In this visualization, I plot the name of listing based on Geodata (using calculated field by concatenate latitude and longitude field). I choose to add coloring based on their respective neighborhood and size difference based on min price (using calculated field by multiplying price and min nights) to provide more information in the visualization. The bigger the size, the minimum price is higher. The most expensive listing is Corner Terrace located in Bedok with minimum price of $1.825.000 SGD.
Dashboards
Intermediate Assignment
Advanced Assignment
Home Projects