Manfred Tan

Welcome to my personal portfolio. Listed below are projects relating to software development, data analysis, data science, visualization, and other topics; feel free to take a look around.

Projects

Visualizing the Digital Landscape

A brief data visualization project exploring the modern attention economy in the era of short-form content.

  • HTML
  • CSS
  • JavaScript
  • Data Visualization

2025 Travelers UMC

Built an end-to-end ML system to predict subrogation for insurance claims using imbalanced tabular data; engineered features, applied class imbalance strategies, and optimized decision thresholds. Evaluated statistical and ML models, including logistic regression, gradient boosted trees (LightGBM, XGBoost), and state of the art frameworks like TabNet and AutoGluon.

  • Python
  • Data Analysis
  • Machine Learning
  • Prediction
  • Scikit
  • AutoGluon

NYC Buddy - Chatbot with RAG

Developed a chatbot using Retrieval-Augmented Generation (RAG) to deliver context-aware responses. Employed Groq (llama3-70b-8192) for fast model inference and Pinecone for efficient vector similarity searches. Processed community recommendations into embeddings with HuggingFace encoders (dwzhu/e5-base-4k) to generate personalized responses, and deployed the application on HuggingFace Spaces with a Gradio interface.

  • Python
  • LLM
  • Groq
  • Encoders
  • Pinecone

Cherry Blossom Bloom Predictions

Developed a time series model to predict cherry blossom bloom dates using historical temperature patterns. Applied FastDTW for dynamic time warping analysis, matching temperature sequences to historical bloom cycles to estimate upcoming bloom dates. Submission for the 2025 GMU Prediction Competition.

  • Python
  • Data Analysis
  • Time Series Forecasting
  • FastDTW

CSAS 2025 Data Challenge - Baseball Analytics

Analysis of bat speed and swing length consistency in MLB batters using novel data for the 2024 season. Applied Chi-Square and binomial tests to explore the impact of game conditions on swing deviations. Submission for the CSAS 2025 Data Challenge.

  • Python
  • R
  • Data Analysis

Regression Modeling of Pedestrian Activity by Temperature

Modeling the relationship between daily temperature and pedestrian activity in Melbourne, Australia, using regression techniques in R. Implemented simple and polynomial regression models, performing cross-validation to select the best-fit model.

  • Python
  • R
  • Data Analysis
  • Regression Modeling

Cryptography Algorithm

An exploration of cryptography techniques and ciphers implemented using base Python and Java (no cryptography libraries used). Available to test using the web interface.

  • Python
  • Java
  • Flask
  • Cryptography