Yelp is a popular website that users can search certain business and get the users reviews and basic information associated with that business. It is helpful for making choices regarding eating, shopping for individual life. Since Yelp just lists information for particular business, it is very hard for users to go through lists of all business and have a high level view. In that way, it can’t help people who want to grow their business, improve their services. So we think of collecting all these information together from Yelp and providing advanced analytics.
In our analysis, we choose restaurant as business entity. We want to help these business owners to get an overview of how to choose site to start or expand a new business, how to set the right price and how to improve their business though some attributes such as wifi and parking.
The distribution of number of restaurants on map reveals huge spatial difference. Coastal areas tend to have more restaurants, while inland states contain less of them. For our following analysis, we choose three cities San Francisco,Albuquerque and Detroit in following analysis demo.
The source of our data is Yelp’s API and its website and both our analysis and visualization are displayed in the following sections.
Variable | Description |
---|---|
Area | String. The area the restaurant belongs to. Example: Detroit-Riverside |
Claimed Status | Bool. Whether the restaurant is claimed by some owner or not |
Health Inspect | Float. Health Grades |
Id | String. A unique id for each restaurant |
Latitude | Float. Geological coordinate |
Longitude | Float. Geological coordinate |
Price | Categorical. $: Inexpensive; $$: Moderate; $$$: pricey; $$$$: Higher end |
Rating | Float. The rating for the restaurant, range 0-5. |
Related Business | String. Another restaurant id, Which is recommended by Yelp, other customer will also view. |
Review | Float. Number of reviews the restaurant has, indicate the popularity of the restaurant. |
Tag | String. Label for the restaurant, the label can be food kind such as dessert etc. While it can also group the restaurant by area, such as “Chinese”. Each restaurant can have multiple labels. |
Title | String. The name of the restaurant. |
Url | String. The url link of the restaurant. |
At first, we try to use yelp api. Unfortunately, Yelp api only returns 20 maximum restaurant records each time we call a search, which is far from enough for analysis. To get enough amount of data, we then turn to scrap the website page by page as an alternative choice.
The total number of pages we scrap is over 20000. During the search process, we found out that using “City” as search term is not a wise choice. Since the upper limit of the number of records displayed for any search term is 1000. It largely limited the amount of data we can get for analysis. Our strategy to solve this problem is to split one city into multiple sub-areas, for instance detroit is splitted into Downtown Detroit, Detroit Riverside etc. In small areas, the number of restaurants will not exceed the upper bound for search records. Then we can get almost all restaurants in a city by simply adding up all the records we get from scraping each small area. There is also downside of this method, we can not using such method to scrap large cities like NYC, since there is no guarantee that the number of records in even small area will be less than 1000.
After getting the data from webpage, we first reformat some of the results. For instance, we treat reviews as numeric value and price as factors. Also we ignore the records with missing value. Finally we stack all those variables we got to be a dataframe.
For the purpose of convenience, we group our raw tag set into 9 new categories. The original tags and our grouping methods are listed below.
Categories | Raw Tags |
---|---|
Chinese | Chinese, Cantonese, Szechuan, Shanghainese, DimSum, etc |
Alcohol | WhiskeyBars, ChampagneBars, CocktailBars, Beer, etc |
JanpaneseKorean | Japanese, Korean, SushiBars, Izakaya, Teppanyaki, etc |
American | American(New), American(Traditional), FastFood, ChickenWings, etc |
South American(Mexican) | Mexican, Tacos, Tex-Mex, LatinAmerican, Salvadoran, etc |
Southeast Asian | Thai, Laotian, Vietnamese, Malaysian, Singaporean, etc |
Indian | Indian, Bangladeshi, Himalayan/Nepalese, etc |
Europe | Italian, French, Greek, Belgian, etc |
Desert | IceCream&FrozenYogurt, JuiceBars&Smoothies, Cupcakes, etc |