Skip to content

Latest commit

 

History

History
206 lines (167 loc) · 11.7 KB

README.md

File metadata and controls

206 lines (167 loc) · 11.7 KB

Toronto Rental Insight App

This project is a continuation of the Toronto_Rental_ETL_Project. The Toronto_Rental_ETL_Project scraped, cleaned, transformed and stored data from multiple data sources and made it available for use through a Flask API which can be found here. This project aims to further automate the task of acquiring the data by using a scheduler that will scrape the data sources on a daily basis and update the database and provide the much granular data using improvised API services. This project also aims to make this data available to users for exploration through the creation of a user friendly, interactive dashboard, and finally hosting the application in the cloud.

Check out the Final Product: Here

Project Intro/Objective

The applications that provide classified rental services like Craigslist and Kijiji focus on providing details regarding only the listed rentals. That doesn't necessarily mean all what the users are looking for.  One would be truly interested to know how safe the neighbourhood is? What are the community services available in the neighborhood to use?, What is the socio-economic status of the location (like average income and age) ? That is what this project is all about; to provide the users with all they want about and around the rental listings!! The purpose of this project is to help users find a rental property that fits within their budget as well as other requirements such as being located in a crime free neighborhood and is in close proximity to community services such as schools and healthcare centers. To be specific, the purpose of this project is to help users find a rental property that fits within their budget as well as other requirements such as being located in a crime free neighborhood and is in close proximity to community services such as schools and healthcare centers.

Data Sources

Methods Used

  • Data Extraction (Selenium, BeautifulSoup, google API, mapquest API)
  • Data Transformation (Python - pandas, numpy, regex)
  • Data Loading (MongoDB - cloud, MongoClient)
  • Automation of Extraction, Transformation & Load (Advanced Python Scheduler - Heroku )
  • API services - Flask
  • Data Visualization (Leaflet :- leaflet sidebar(extension), heatmap(plugin), Leaflet Panel Layers(plugin), Plotly, D3(Data handling for visuals)

Project Architecture

Architecture

Architecture consists of full stack - Automated ETL -> Python - Flask -> HTML/CSS/JS

Automated ETL

  • Extract: Extracts the data from Kijiji, Craigslist, Toronto Police Services(TPS),  Canada Revenue Agency (CRA), and Stats Canada using Scraping and API services.`

  • Transform: Transformation of data through various python packages including pandas and numpy.

  • Load: Loads the data to the cloud MongoDB(Atlas) database. ETL is separate from the cloud application. Serves as a separate functionality to extract, transform and preload the database. 

  • Automation - The process of Extraction of rental data (only from Craigslist), Transform and Load is automated.

    • Scraper: Crawls the Rental Data from Craigslist
    • Scheduler: Hosted in Heroku as a separate Scheduling Application Schedules ETL every day at 4.30 AM EDT (If you wonder why, There is no specific reason for this specific time !!)
    • Differencer:
      • Updates the current and historical rental DB Tables.
      • Current rental data - consists of only the current available rental listings.
      • Historic rental data - consists of all rental listings.

    A snippet of automated ETL process is shown below (heroku logs).

    2020-09-26T04:30:00.001509+00:00 app[clock.1]: Started updateDB
    2020-09-26T04:34:04.395572+00:00 app[clock.1]: Finished craigs_list_api_call
    2020-09-26T04:34:05.423526+00:00 app[clock.1]: Finished differencer
    2020-09-26T04:34:06.736165+00:00 app[clock.1]: Finished instatiate_driver
    2020-09-26T04:34:13.404711+00:00 app[clock.1]: Finished craigs_list_scrape
    2020-09-26T04:34:13.434493+00:00 app[clock.1]: Finished clean_craigslist
    2020-09-26T04:34:13.448337+00:00 app[clock.1]: Finished geocode
    2020-09-26T04:34:13.924970+00:00 app[clock.1]: Finished fill_Lat_Long
    2020-09-26T04:34:13.953593+00:00 app[clock.1]: Finished clean_rental_for_merg
    2020-09-26T04:34:14.026133+00:00 app[clock.1]: Finished updateDB
    2020-09-26T04:34:14.033489+00:00 app[clock.1]: Finished Updating the DB
    2020-09-28T04:34:28.051408+00:00 app[clock.1]: Finished craigs_list_api_call
    2020-09-28T04:34:28.999139+00:00 app[clock.1]: Finished differencer
    2020-09-28T04:34:31.631729+00:00 app[clock.1]: Finished instatiate_driver
    2020-09-28T04:36:11.449854+00:00 app[clock.1]: Finished craigs_list_scrape
    2020-09-28T04:36:11.494136+00:00 app[clock.1]: Finished clean_craigslist
    2020-09-28T04:36:14.459126+00:00 app[clock.1]: Finished fill_Lat_Long
    2020-09-28T04:36:14.520678+00:00 app[clock.1]: Finished clean_rental_for_merg
    2020-09-28T04:36:14.729932+00:00 app[clock.1]: Finished updateDB
    2020-09-28T04:36:14.737698+00:00 app[clock.1]: Finished Updating the DB
    2020-09-29T04:35:03.578897+00:00 app[clock.1]: Finished craigs_list_api_call
    2020-09-29T04:35:04.963784+00:00 app[clock.1]: Finished differencer
    2020-09-29T04:35:08.886718+00:00 app[clock.1]: Finished instatiate_driver
    2020-09-29T04:40:11.930050+00:00 app[clock.1]: Finished craigs_list_scrape
    2020-09-29T04:40:12.024977+00:00 app[clock.1]: Finished clean_craigslist
    2020-09-29T04:40:20.150364+00:00 app[clock.1]: Finished fill_Lat_Long
    2020-09-29T04:40:20.215436+00:00 app[clock.1]: Finished clean_rental_for_merg
    2020-09-29T04:40:20.488990+00:00 app[clock.1]: Finished updateDB
    2020-09-29T04:40:20.495897+00:00 app[clock.1]: Finished Updating the DB

Python - Flask

Handles request from front end Javascript, Inteacts with MongoDB and provides the requested data in JSON format

FrontEnd

FrontEnd Consists of the HTML/CSS/Javascript stack. Javscript retrieves the data from the APIs hosted by Flask based on user's selection

Frontend Wireframes

We created wireframes of the final product we had in mind. The wireframes below show the intial design developed by a our entire team with the user in mind.

Headline

The Final Product

Map

The leaflet library displays a map of Toronto with FSA boundaries outlined. By default, the map shows markers for the daily rental postings. The map includes a toggle bar that enables users to add/remove markers for:

  • Rental postings
  • Community assets
  • Homocides in Crime (past 6 months).

The map includes the following functionalities:

  • tooltip when hovering over markers
  • Upon clicking on a rental posting marker, a circle shows a 1 km radius around the marker. Other markers appear within that circle's radius showing crime incidences that occurred in the past 6 months.

Headline

Headline

Sidebar

By default/opening dashboard, the side bar contains dropdown menus that enable users to filter the rental posting markers display by:

  • FSA
  • Rental cost
  • Number of bedrooms

Headline

Headline

Headline

A filter button at the top of the sidebar enables users to toggle back to this view.

Otherwise, there will be three versions of the sidebar based on how users interact with the map.

Click on Rental Posting Marker

  • Rental posting details
  • Bar chart of average income with FSA compared to Toronto overall
  • Heat map of Toronto showing crime incidence

Headline

Headline

Click on Community Asset Marker

  • Community Asset information

Headline

Headline

Click on FSA

The sidebar shows charts explaning the rental trends such as age and income by age.

Rental posting data

  • bar graph displaying average cost to rent in FSA (data scrapped that day) by number of bedrooms.
  • line graph displaying average to rent in FSA over time (historical/trending data) by number of bedrooms.

Headline

Headline

Technologies

  • Python
    • Extraction
      • BeautifulSoap
      • Selenium
    • Transformation
      • Pandas
      • Numpy
      • Regex
  • MongoDB
    • MongoClient
    • Atlas
  • JavaScript
    • D3
    • Leaflet
      • Sidebar
      • Leaflet Panel
      • Heatmap
      • Mapbox
    • Plotly
  • Web
    • HTML
    • CSS
    • BootStrap