Wenkang Wei
My Websites
Coding practice website: https://wenkangwei.gitbook.io/leetcode-notes/
It summaries of coding practice and implementation of algorithmsTechnical blog: https://wenkangwei.github.io/
It summaries of algorithms and projectsSource code repository: https://github.com/wenkangwei
It stores my open source projects, source codes and some datasets
Who am I? My Resume
Machine Learning Fan and Researcher - I’m a second-year master student of computer engineering with minor of Computer Science. So far, I’m doing researches about deep learning and data mining. I’m working with Dr. Adam hoover on Human activity (Eating) pattern recognition research. I’m also a research assistant in Clemson-Clair Lab of Dr. Kai Liu with research topic of machine learning optimization.
Job Seeker - I’m actively looking for a full-time job related to Data Scientist or Machine Learning Engineer that starts from Summer 2021 after my graduation in May 2021
My Resume can be found Here or LinkedIn
You are welcomed to email me if you have any suggestion or reference to me.
blogger - I find that writing blogs can help me manage my tehcnical notes and problems I solved in the past. What’s more, it provides a platform for me to communicate techniques with more people. I’m trying to manage my technical notes and my thoughts on some problems I wrote in the past.
Geek - I usually find some technical projects to do once I’m leisure. Some projects are inspired by my daily life, my classes on campus and my researches. To know more about it, Click HERE
Dreamer - I usually imagine how our brains work and ponder how AI will collaborate with human in the future and how can we borrow human learning behaviors or architectures of human brain to design a new AI algorithm. Hopefully, the secret behine human brain can be revealed one day.
Amateur of painting
I’m a fan of japanese animes and hence sometimes do some paintings related to anime characters.
Educational Background
I’m a master student of computer engineering with the focus area of intelligent system and pattern recogniton. I have received the BS degree with EE major and named to Dean’s List as well in Clemson Univeristy. If you are interested in my academic achievements, Click HERE
Work Experience
Machine Learning Research Assistant - Summer 2020-Current
- Proof of convergence and convergence rate of Multiple Update Algorithm (MUA) in Non-Negative Matrix Factorization Problem
- Formulated Matrix Factorization Problem into Constraint Optimization Problem
- Applied Linear Algebra, Lagrange multiplier to simplify problem and utilized Lipschitz gradient, convex optimization to prove the convergence and convergence rate of MUA algorithm
- Implemented MUA and ALS (alternative least square) algorithm in Google Colab and Matlab to verify convergence result
- Collaborated and communicated with CS professor Dr Kai Liu to present mathematic proof process orally
- Wrote a paper in AAAI format using Latex (unpublished due to copyright reason)
Technical Skills
Programming:
- Python/Jupyter Notebook
- PostgreSQL
- C/C++
- Matlab
- HTML, Markdown, Latex
Tools:
- Deep Learning Framework: PyTorch / Tensorflow
- Distributed Machine Learning: PySpark, Hadoop MapReduce,MPI
- Data Analysis toolkits: sklearn, pandas, seaborn, etc
- Platform: Raspberry Pi, Linux, Google Colab, Git, AWS (RDS, EC2)
Theory and Analysis Techniques:
- Feature Engineering, Data Visualization and Preprocessing Techniques, PCA, NLP text processing: Word Embedding, TF-IDF, etc
- Machine Learning Modeling: Collaborate Filtering, Matrix Factorization, SVM, Decision Tree, Clustering, Convolution Neural Network etc
- Techniques for Model Evaluation and Improvement: Cross-Validation, Ensemble Learning, ROC, AUC, Feature Importance, etc.
Selected Projects
Image Classification
Car Classification using Transfer Learning - Fall 2020
- Constructed data pipeline by PyTorch to extracted and transformed Stanford car images dataset (1.96GB dataset with 196 classes)
- Modified and tuned pre-trained models Google-Net, VGG-16 , Res-Net 50 to fit car dataset using early stopping, weight decay techniques
- Improved test accuracy of the best model to 85% using cross-validation model selection techniques
Recommendation System
Recommendation System based on MovieLens 25M dataset ((PySpark, Hadoop, HDFS, SQL)
- Utilized PySpark to load movielens 25M dataset (25 million ratings) and used SQL to query and analyze data in databrick cluster platform
- Implemented and applied Mapper, Reducer functions in Hadoop File system to analyze contribution of different movie genres to ratings
- Applied Collaborative Filtering and Matrix Factorization methods to construct a recommendation system with PySpark
- Achieved 0.67 mean square error score and deployed recommendation system using IPython widget
KKBox Music Recommendation System
- Utilized Exploratory Data Analysis (EDA) techniques and data visualization to analyze relationship between features
- Constructed Data pipeline to clean data by filling missing values, converting data type and transform data using OneHot encoding, Embedding, etc.
- Implemented and applied Light gradient boosting machine and wide and deep neural network for recommendation system
Data Science and NLP
-
- Visualized and analyzed data related to customer churn by using visualization toolkits: seaborn, matplotlib
- Preprocessed and transforms categorical data for Machine Learning model training using pandas toolkit and normalization techniques
- Established Data Pipeline and ML Models: Random Forest, Logistic Regression, SVM, etc. and Evaluated Models using ROC,AUC
- Improved Models Accuracy from 80% to 86% by using Model Selection, Cross Validation and Feature Selection, L1 Regularization techniques
Youtube Comments Analysis and Pet Owners Classification (PySpark, SQL, Databrick Cluster)
- Utilized PySpark and PostgreSQL to load, query and explore Youtube comment text data (about 1GB after decompression)
- Built data pipeline and applied Term-Frequency-Inverse Document-Frequency(TF-IDF) to transform text data into numerical data
- Applied Logistic Regression, Random Forest, Gradient Boosting machine in PySpark to classify cat or dog owners from comments
- Achieved 92% prediction accuracy on test set using grid search and cross validation
Software Development
- Real-time Signal Visualization System (C++, Qt, GDB)
- Designed a visualization software system based on Qt toolkit, Arduino using C/C++ to solve the problem of visualizing voltage signal data in real time with self-motivation and initiative
- Designed GUI components and class modules for software interface in Qt and software framework to control data visualization behaviors in C++ using data structure (queue) and Object-Oriented Programming (OOP) techniques
- Integrated, tested and debugged GUI components with software framework using GDB toolkit and Qt IDE
- Wrote technical document for software system in Github with video demo. Link to demo: https://github.com/wenkangwei/SerialPlot
- More projects will be uploaded soon. They are mentioned in my resume or LinkedIn. Please feel free to contact me via email: wenkanw@g.clemson.edu or LinkedIn if you have any questions or any job opportunities for me. Thanks!
My Interests
Research Interests:
Data mining, Recommendation System, NLP, machine learning,deep learning and their applications.
Other Interests:
Anime, painting, music, badminton,swimming…
My Framework for solving problems / researching
You are welcomed to contact me by wenkangwei917@gmail.com or by https://github.com/wenkangwei if you have any suggestion on my projects or my blog.