Manfred Tan
Welcome to my personal portfolio. Listed below are projects relating to software development, data analysis, data science, visualization, and other topics; feel free to take a look around.
Projects
2025 Travelers UMC
Built an end-to-end ML system to predict subrogation for insurance claims using imbalanced tabular data; engineered features, applied class imbalance strategies, and optimized decision thresholds. Evaluated statistical and ML models, including logistic regression, gradient boosted trees (LightGBM, XGBoost), and state of the art frameworks like TabNet and AutoGluon.
- Python
- Data Analysis
- Machine Learning
- Prediction
- Scikit
- AutoGluon
NYC Buddy - Chatbot with RAG
Developed a chatbot using Retrieval-Augmented Generation (RAG) to deliver context-aware responses. Employed Groq (llama3-70b-8192) for fast model inference and Pinecone for efficient vector similarity searches. Processed community recommendations into embeddings with HuggingFace encoders (dwzhu/e5-base-4k) to generate personalized responses, and deployed the application on HuggingFace Spaces with a Gradio interface.
- Python
- LLM
- Groq
- Encoders
- Pinecone
Cherry Blossom Bloom Predictions
Developed a time series model to predict cherry blossom bloom dates using historical temperature patterns. Applied FastDTW for dynamic time warping analysis, matching temperature sequences to historical bloom cycles to estimate upcoming bloom dates. Submission for the 2025 GMU Prediction Competition.
- Python
- Data Analysis
- Time Series Forecasting
- FastDTW
CSAS 2025 Data Challenge - Baseball Analytics
Analysis of bat speed and swing length consistency in MLB batters using novel data for the 2024 season. Applied Chi-Square and binomial tests to explore the impact of game conditions on swing deviations. Submission for the CSAS 2025 Data Challenge.
- Python
- R
- Data Analysis
Regression Modeling of Pedestrian Activity by Temperature
Modeling the relationship between daily temperature and pedestrian activity in Melbourne, Australia, using regression techniques in R. Implemented simple and polynomial regression models, performing cross-validation to select the best-fit model.
- Python
- R
- Data Analysis
- Regression Modeling