As we all know, America is a country full of diversive culture. People from all over the world come here to work,study and live. Among different elements, food is one of the most interesting topics we want to discover since it is so deeply involved in our daily life. On a large scale, we are interested in the spatial distribution of number of restaurants in different states. Meanwhile, understanding restaurant’s price, rating, number of reviews and their kinds’ distribution will give us a rough idea about our own choice of food. As we go further in the discovery of restaurants, we expect we can find out connection between them by their kinds.
All those questions drive us to Yelp, a popular website displaying restaurants information. The source of our data is Yelp’s API and its website and both our analysis and visualization are displayed in the following sections.
The distribution of number of restaurants on map reveals huge spatial difference. Coastal areas tend to have more restaurants, while inland states contain less of them. For our following analysis, we choose three cities to represent west, middle and east America. They are San Francisco,Albuquerque and Detroit.
Variable | Description |
---|---|
Area | String. The area the restaurant belongs to. Example: Detroit-Riverside |
Claimed Status | Bool. Whether the restaurant is claimed by some owner or not |
Health Inspect | Float. Health Grades |
Id | String. A unique id for each restaurant |
Latitude | Float. Geological coordinate |
Longitude | Float. Geological coordinate |
Price | Categorical. $: Inexpensive; $$: Moderate; $$$: pricey; $$$$: Higher end |
Rating | Float. The rating for the restaurant, range 0-5. |
Related Business | String. Another restaurant id, Which is recommended by Yelp, other customer will also view. |
Review | Float. Number of reviews the restaurant has, indicate the popularity of the restaurant. |
Tag | String. Label for the restaurant, the label can be food kind such as dessert etc. While it can also group the restaurant by area, such as “Chinese”. Each restaurant can have multiple labels. |
Title | String. The name of the restaurant. |
Url | String. The url link of the restaurant. |
At first, we try to use yelp api. Unfortunately, Yelp api only returns 20 maximum restaurant records each time we call a search, which is far from enough for analysis. To get enough amount of data, we then turn to scrap the website page by page as an alternative choice.
The total number of pages we scrap is over 20000. During the search process, we found out that using “City” as search term is not a wise choice. Since the upper limit of the number of records displayed for any search term is 1000. It largely limited the amount of data we can get for analysis. Our strategy to solve this problem is to split one city into multiple sub-areas, for instance detroit is splitted into Downtown Detroit, Detroit Riverside etc. In small areas, the number of restaurants will not exceed the upper bound for search records. Then we can get almost all restaurants in a city by simply adding up all the records we get from scraping each small area. There is also downside of this method, we can not using such method to scrap large cities like NYC, since there is no guarantee that the number of records in even small area will be less than 1000.
After getting the data from webpage, we first reformat some of the results. For instance, we treat reviews as numeric value and price as factors. Also we ignore the records with missing value. Finally we stack all those variables we got to be a dataframe.
For the purpose of convenience, we group our raw tag set into 9 new categories. The original tags and our grouping methods are listed below.
Categories | Raw Tags |
---|---|
Chinese | Chinese, Cantonese, Szechuan, Shanghainese, DimSum, etc |
Alcohol | WhiskeyBars, ChampagneBars, CocktailBars, Beer, etc |
JanpaneseKorean | Japanese, Korean, SushiBars, Izakaya, Teppanyaki, etc |
American | American(New), American(Traditional), FastFood, ChickenWings, etc |
South American(Mexican) | Mexican, Tacos, Tex-Mex, LatinAmerican, Salvadoran, etc |
Southeast Asian | Thai, Laotian, Vietnamese, Malaysian, Singaporean, etc |
Indian | Indian, Bangladeshi, Himalayan/Nepalese, etc |
Europe | Italian, French, Greek, Belgian, etc |
Desert | IceCream&FrozenYogurt, JuiceBars&Smoothies, Cupcakes, etc |