About Me

Hi, I’m Bobby Zhu, a recent Data Science graduate from UC San Diego with experience in analytics across healthcare, business, and policy.

I specialize in building ETL pipelines and transforming data into actionable insights. My projects include predictive healthcare models, strategic business decisions, and policy analysis. I thrive in collaborative, real-world problem-solving environments.

I’m eager to apply my skills and passion for data-driven solutions to impactful projects. Let’s connect!

  • Data Analyst
    SQL, Data Cleaning, Data visualization, BI Tools
  • Data Engineer
    ETL Development, Big Data Technologies, Cloud Infrastructure
  • Machine Learning Engineer
    Machine Learning Frameworks, Deep Learning, Model Deployment
  • Frontend Developer
    HTML, CSS, JavaScript, React, Version Control
  • 2023.01-Current
    Teaching Assistant at Halıcıoğlu Data Science Institute
  • 2024.06-2024.08
    Data Analyst Intern At California Policy Lab
  • 2023.06-2024.06
    Data Engineer Intern At Scripps Research
  • 2023.07-2024.01
    Data Analyst Intern At IMD Business School
  • 2025.03
    B.S. Data Science at UC San Diego
    Minor in Business Economics

My Projects

Latent Classes of Sepsis Patients

This study analyzed over 36,000 EHRs to perform latent class analysis, identifying patient subgroups based on clinical patterns and assessing their associated sepsis risks.

Data Pipeline for Image Extraction & predictive models of Cardiac disease

This project utilizes a large-scale MRI image dataset to extract key imaging features and applies these features to predictive models for diagnosing cardiac disease.

Future Readiness Score

This project utilized sentiment analysis of company news and machine learning models to help organizations assess their readiness for future business challenges and trends.

FinEmotionFusion

FinEmotionFusion is an emotion detection system designed for financial phone calls, leveraging early fusion of audio and text modalities to enhance accuracy and context-awareness.

Resume

Bobby Zhu

San Diego, CA, 92122, United States

bobbyzhu.work@gmail.com | linkedin.com/in/bobby-zhu | github.com/Bobby-Zhu

Education

University of California San Diego

Major: Data Science | Minor: Business Economics

Sept 2021 – March 2025

GPA: 3.88/4.0

  • Coursework: Data Management, Systems for Scalable Analytics, Machine Learning, Deep Learning, Representation Learning, Recommender Systems, Data Visualization

Experience

Data Analyst Intern

California Policy Lab, San Diego, CA

June 2024 – August 2024

  • Developed an ETL pipeline for 800GB of consumer credit panel data using MySQL and R, reducing data processing time by 30%.
  • Built linear regression and difference-in-difference models to evaluate the long-term impact of natural disaster compensation on victim credit performance, supporting data-driven policy decisions.
  • Conducted a literature review and adapted existing methodologies to analyze wildfire victim compensation data for a novel use case.

Data Engineer Intern

Scripps Research Translational Institute, San Diego, CA

June 2023 – June 2024

  • Engineered a data processing pipeline for 80GB cardiac images, achieving 91% pixel-wise segmentation accuracy.
  • Trained a predictive model using XGBoost, achieving 82% accuracy and 77% AUC.
  • Developed unit tests for data pipelines, streamlining workflows and improving reliability.

Data Analyst Intern

IMD Business Instituion, Remote

July 2023 – January 2024

  • Implementented a business prediction index by applying sentiment analysis to corporate news data from six industries using Sklearn, Pandas, and a fine-tuned BERT model, enabling accurate forecasts of industry trends and improving strategic decision-making.
  • Automated multi-source data aggregation by integrating Python scripts with 4 external APIs, optimizing the collection of news and stock data, and reducing processing time by 70%.
  • Designed interactive dashboards to visualize key performance indicators (KPIs) and predictive trends for stakeholders using Plotly, enhancing report delivery efficiency by 40%.

Research

Data-Driven Suicide Prevention Initiative (PRISM)

With Dr. Danks & Dr. Wayne

Sept 2023 – Present

  • Designed a modularized data integration pipeline to automate ETL processes for 500+ files.
  • Led explainable AI efforts using Random Forest and SHAP values to identify key features from UMAP embeddings of suicide data.

Teaching Experience

Undergraduate Teaching Assistant

Halıcıoğlu Data Science Institute, San Diego, CA

Jan 2023 – Present

  • Assisted 800+ students by hosting office hours and discussion sections, explaining concepts in Python, data structures, algorithms, and PostgreSQL.
  • Automated testing for 70% of assignments using Pytest, developing over 20 tests and reducing grading time by 85%.
  • Designed and presented an automated data transformation pipeline leveraging AWS S3, Lambda, and RDS, streamlining data processing workflows for instructional purposes.

Projects

AWS Cloud Data Warehouse for KPI Analysis

MySQL, Redshift

Aug 2024 – Nov 2024

  • Developed an ETL workflow to extract data from MySQL and S3, transform it with AWS Glue, and load it into Redshift, reducing data processing time by 40%.
  • Created 7 SQL scripts to automate KPI calculations, improving forecasting accuracy by 20%.
  • Leveraged Apache Superset to visualize KPI results, creating interactive dashboards for stakeholders.

Latent Class Classification for Sepsis Patients

Python, PostgreSQL, R

Sep 2024 – Present

  • Configured a Dev Container for a team of six, integrating Miniconda and PostgreSQL for a consistent development environment.
  • Developed 10 SQL scripts to preprocess patient groups and calculate critical illness scores, including SOFA and LCA ICU metrics.
  • Leveraged the poLCA library to perform Latent Class Analysis, achieving AUC scores above 95% for all classes.

Triple C Club Website Development

React Native, Git

Jan 2023 – Apr 2023

  • Developed 4 reusable React Native components for the Triple C Club’s static website, enhancing frontend modularity.
  • Collaborated with a team to manage and develop features across multiple branches using Git, ensuring high-quality code.

Skills

  • Languages: Python, SQL, Java, JavaScript, R, HTML, CSS
  • Technologies: Python DS stack (e.g., Numpy, Pandas, Sklearn), PyTorch, Regex, PostgreSQL, MySQL, Spark, Hive, Hadoop, AWS, HPC, Linux/Unix Commands, GitHub

Contact Me

bobbyzhu.work@gmail.com

858-291-3844

Download CV