RevoU SQL Assignment

Project Summary

  1. Created SQL query based on business problem using aggregate functions, date format, joins, CTE, and window function.
  2. Created monthly retention cohort analysis using SQL function to help retain users.

Insights

CASE 1

  1. December is the month with the longest average bicycle rental duration with 23.81 minutes, followed by July with 17.31 minutes and June with 16.99 minutes.
  2. San Francisco has the most total trips and bikes, followed by Oakland, and finally San Jose.
  3. The male gender has the oldest age of 136 years and the youngest age of 22 years. The female gender has the oldest age of 122 years and the youngest age of 22 years. Gender others have the oldest age 122 years and the youngest age 22 years.
  4. The last recorded departure trip was on 2016-08-31 from Washington at Keamy station.
  5. July 2017 was the month with the highest growth percentage of 1311% and June 2017 was the month with the lowest growth percentage of -91%.

CASE 2

  1. The number of story posts has decreased continuously from Feb 2014 until Dec 2014. It is shown by graph above, monthly. The forum needs to improve its user engagement and retention immediately since this decrease in activity has become a trend (downtrend).

Project Files

For a more comprehensive analysis and visualization, please open the project files.

Project Background


As a data analyst, we are expected to use SQL to access, read, manipulate, and analyze the data stored in the database and generate useful insights to drive an informed decision-making process. Therefore, we tested our level of comprehension by doing the following assignment.

Data Scope, Goals & Objectives

CASE 1 | San Fransisco Ford GoBike Dataset

This dataset was provided by google bigquery public data in google cloud platform. San Francisco Ford GoBike , managed by Motivate, provides the Bay Area’s bike share system. Bike share is a convenient, healthy, affordable, and fun form of transportation. It involves a fleet of specially designed bikes that are locked into a network of docking stations. Bikes can be unlocked from one station and returned to any other station in the system. People use bike share to commute to work or school, run errands, get to appointments, and more. The dataset contains trip data from 2013-2018, including start time, end time, start station, end station, and latitude/longitude for each station.

CASE 2 | Hacker News Dataset

This dataset was provided by google bigquery public data in google cloud platform. This dataset contains all stories and comments from Hacker News from its launch in 2006 to present. Each story contains a story ID, the author that made the post, when it was written, and the number of points the story received.

Goals

Understand basic to advanced SQL function to help access, read, manipulate, and analyze the data stored in the database and generate useful insights to drive an informed decision-making process.

Objectives

  1. Creating SQL query based on business problem using aggregate functions, date format, joins, CTE, and window function.
  2. Creating monthly retention cohort analysis to help retain users.

Intermediate Assignment

CASE 1

1. Create a query to get the average amount of duration (in minutes) per month


December is the month with the longest average bicycle rental duration with 23.81 minutes, followed by July with 17.31 minutes and June with 16.99 minutes.

2. Create a query to get total trips and total number of unique bikes grouped by region name


San Francisco has the most total trips and bikes, followed by Oakland, and finally San Jose.

3. Find the youngest and oldest age of the members, for each gender. Assume this year is 2022.


The male gender has the oldest age of 136 years and the youngest age of 22 years. The female gender has the oldest age of 122 years and the youngest age of 22 years. Gender others have the oldest age 122 years and the youngest age 22 years.

4. Get the latest departure trip in each region.


The last recorded departure trip was on 2016-08-31 from Washington at Keamy station.

5. Create a query to get Month to Date of total trips in each region, breakdown by date.



Advanced Assignment

6. Find monthly growth of trips in percentage, ordered by time descendingly. Only for trips from the region that has the highest total number of trips.



July 2017 was the month with the highest growth percentage of 1311% and June 2017 was the month with the lowest growth percentage of -91%.

CASE 2

7. Given another dataset (Hacker News) please use table “stories” to create monthly retention cohorts (the groups, or cohorts, can be defined based upon the date that a user/author started a story) and then how many of them (%) coming back for the following monts in 2014. After analyzing the retention cohort, is there any interesting insight that we can get?




The number of story posts has decreased continuously from Feb 2014 until Dec 2014. It is shown by graph above, monthly. The forum needs to improve its user engagement and retention immediately since this decrease in activity has become a trend (downtrend).


Home Projects