My name is Miao(Elena) Wang. I am a graduate student at University of California, Davis. My master degree is in statistics. I also received my undergraduate degree in finance from Shanghai University of Finance and Economics in 2015.
Since I have enthusiasm about what is hidden behind the data, I want to find a full time job as a data analyst/data scientist. I have gained a lot of experience in data analysis, from data collection, data visualization to machine learning. I am familiar with multiple software (R, Matlab, SAS) and programing languages (Python and C++).
Tools: Python, Javascript, Bokeh; Mar. 2017 UC Davis
America is a big country with multiple culture and eating styles in different places. We want to have a data-driven view of that. And yelp can provide information of food and its associated culture, which is a good resource to explore. In this situation, restaurant is the entity that we are interested in.
For the data collection, we scraped over 20000 pages on yelp.
For the data exploration, interactive data visualization is the biggest highlight in this project. We illustrate distributions in the different spec on map. In the descriptive statistics, we utilized bokhe and javascript to make interactive plots and show the relationship between restaurants price level, rating and area. In the end, we made network plot to mine the hidden information within different restaurants categories.
You can visite our webpage for more detail by clicking below button.
Tools: Python, R; May 2016 UC Davis
IMDB is an online database of information related to televisions, films and so on. Users can leave comments and rate for movies. We use sentimental analysis to classify user's rating of given comments. Meanwhile metrics related with movies are also important for user's rating. Logistic, Random Forest, XGBoost and SVM are the main machine learning methods used to build a classifier.
Two parts of dataset are used in project. First dataset we used contains 50000 users’ comments, with over 80000 words appearing. In this raw dataset, we did lots of data manipulation, such as removing stop word, keeping intense sentiment words, to get the final bag of words. Second dataset comes from OMDb API (Open Movie dataset), which includes information on the movie title, director, actors, box office, year of release, genre and other less interesteing variables of movies appearing in the first dataset.
Before modeling, we did tf-idf transformation for comments dataset as an option. In the modeling, we used Logistic, Random Forest, XGBoost and SVM to build classifier. In the end, we chose to combine three of them together as a voting classifier, which reduces test error to 14%.
Tools: Python, Mongodb, R; June 2016 - Aug. 2016 Beijing China
Nowaday Coupon is a good marketing tool that helps business to increase their profits. Extracted raw data from Mongodb. Applied data cleaning and feature selection and feature engineering.
Built model to predict rate of coupon usage on WeChat social media APP for industry or brand.
Worked directly with manager to build smart pricing model that helps calculate profit for coupon campaign based on coupon type. Assisted the companies to find best coupon designing strategy to increase profit by average 20%.
Tools: Tableau, R; Oct. 2016 – Jan. 2017; Davis
Pulled and integrated data from several different data base by regular expression and SQL queries. Applied data visualization in Tableau to check the water usage trend.
Mapped data according to geological location. Conducted spatial analysis and built model to predict missing value on map.
Built interactive web application to show visualization of data and model by R shiny.