Northwind
Context
Food distribuition is one the most important industries in the United States. There are about 33,000 establishments with a combined revenue of $670 billion. The 50-largest companies, such as McLane Company, Supervalu, Sysco Corp, and US Foodservice, generate 50% of the food distribution revenue.
Knowing your customers is key to food distributors’ success. The demand of their products is highly dependent on changes in demographics and customer preferences. While largest distributors focus on bulk purchasing and economies of scale, smaller companies compete by specializing in certain products or specific geographical areas.
Supermarkets, restaurants, and convenient stores are responding to changing customer preferences by offering more varied, more practical, and more natural products. This is forcing food distributors to deliver a broader offering more frequently. Dot Foots, for instance, recently launched more than 100 different natural specialty manufacturers to respond to these changes. In this competitive environment, food distributors must use data to identify changing consumer trends and opportunities to reduce waste and improve the efficiency of their operations.
Sources: Refrigerated Transported, Food Logistics, Eater.
Overview
In this project, you will analyze the data of a food distributor in the United States and find insights related to customers, employees, products, and revenue. To setup the project, you will need to create a database and import the data. As in the last project, you will use Jupyter, Python, and Pandas, but you will also use Seaborn to create visualizations, and SQL to query the data in the database. At the end you will submit your Jupyter notebook with your findings on a repository on Github.
Vocabulary
- Revenue: an organizational income before expenses. The difference between revenue and profit, is that revenue doesn’t account for expenses, while profit does.
- Database: an organized collection of data stored electonically in a computer system that supports storage and manipulation. There are different types of databases, but mostly they are divided in relational and non-relational.
- ERD: Entity Relationship Diagram, a visual representation of how different sets of data (entities) in a database relate to each other.
- SQL: Structured Query Language, a domain-specific programming language used to access and manipulate relational databases such as Postgresql, MySQL, and Microsoft SQL Server.
- Seaborn: a Python data visualization library based on matplotlib. It is used to draw statistical graphics.
Learning Goals
At the end of this project, students will be able to:
- Create advanced level aggregations.
- Implement subqueries, views, and common table expressions
- Design visualizations using Seaborn.
- Use joins to find data in different tables.
- Calculate business metrics, such as revenue, to deliver business value.
- Translate business requests into insights.
- Create, restore, and connect to a database.
- Use Entity Relational Diagrams to map out relationships between data sets.
- Query data using dates.
Requirements
For full installation instruction and technical requirements click here.
Submission
You will submit your findings via a repository on Github. The repository will be named <first name>-<last name>-northwind
. The repository should contain your Jupyter notebook and SQL file.
Rubric
The following rubric is divided in technical and professional skills. In order to complete this project successfully, students need to achieve 18 points.
Technical Skills
Completion
- 4: The student was able to exceed the technical requirements and deliver additional insights based on the provided data set.
- 3: The student was able to complete the technical requirements for this project.
- 2: The student was able to complete the technical requirements except for one or two missing features.
- 1: The student was not able to complete the technical requirements or does not have a working solution.
Organization
- 4: The student organized his/her notebook, queries, and insights in a clear and logical structure, and included references where needed.
- 3: The student organized his/her notebook, queries, and insigths in a logical structure.
- 2: The student organized his/her notebook and queries in a mostly logical structure.
- 1: The student’s project was not clear nor organized.
Style
- 4: The student followed SQL and Python conventions, including proper naming and composition. Student wrote his/her notebook in a clear and concise way.
- 3: The student followed SQL and Python conventions, including proper naming and composition.
- 2: The student mostly followed SQL and Python conventions bar some omissions.
- 1: The student writing was inconsistent and unclear, and his/her code didn’t follow conventions.
Professional Skills
Communication
- 4: The student communicated with his/her instructors in a professional manner. The student proactively provided updates, and responded to instructors’ requests in time. The student also communicated any setbacks in time and requested help when necessary.
- 3: The student communicated with his/her instructors in a professional manner and responded to instructors’ requests in time.
- 2: The student communicated with his/her instructors in a professional manner.
- 1: The student did not communicated with his/her instructors or did so in a unprofessional way.
Time Management
- 4: The student delivered all the technical requirements and additional insights in time.
- 3: The student delivered all technical requirements in time.
- 2: The student delivered most of technical requirements in time.
- 1: The student did not delivered his technical requirements in time.
Problem Solving
- 4: The student was able to solve his/her own problems by reviewing documentation, obtaining help from peers and mentors, and reaching out to instructors as a last resort.
- 3: The student was able to solve his/her own problems by obtaining help from peers and mentors, and reaching out to instructors.
- 2: The student was able to solve his/her own problems by only reaching out to instructors.
- 1: The student was not able to solve his/her own problems even with additional help.