Resume

You can get a PDF version of my resume here.

Education

New York University – May 2020
- Master of Science in Data Science (GPA: 3.96/4.00)
- Selected coursework: Deep Learning, Natural Language Processing, Probabilistic Time Series Modeling, Natural Language Understanding and Computational Semantics, Text as Data, Responsible Data Science, Machine Learning, Big Data, Optimization and Computational Linear Algebra
University of California, Los Angeles – June 2018
- Bachelor of Science in Mathematics of Computation with Minor in Neuroscience
- Selected coursework: Information and Power, American Television History, Algorithms in Bioinformatics, Mathematical Modeling

Skills and qualifications


Programming languages	Python, R, MATLAB, C++, HTML, JavaScript
Python library proficiencies	PyTorch, Tensorflow, Scikit-learn, Pandas, Matplotlib, NumPy, SciPy
Big data tools	SQL, PySpark, MapReduce
Design and typesetting	LaTeX, Markdown, WordPress, Adobe InDesign

Selected projects

Capstone: Applying a General Language Model to Medical Text – Applied multilabel text classification strategies to dataset of 6M+ NYU Langone Health medical notes using deep natural language representation models in PyTorch. We experimented with benefits of fine-tuning in a transfer learning context for transformer-based models such as BERT and XLNet.
Data Science: Text Classification with News Articles – Compared sentiment analysis methods, including Naive Bayes, bag-of-words logistic regression, BOW support vector machines (SVM), and neural networks, for dataset of articles about technology companies from The New York Times.
Big Data: Recommender system with PySpark – Developed recommendation model using alternating least squares (ALS) for implicit feedback in PySpark on Last.fm dataset. We experimented with cold start problem using a latent feature regression model for unknown items.
Natural Language Understanding: Building a Semantic Parser to Handle Queries about Song Data – Built a semantic parser using the SEMPRE framework which can answer queries about a subset of the Million Song Dataset.
Numerical Linear Algebra: Implementing an Efficient Matrix Completion Algorithm in Python – Implemented SoftImpute-ALS algorithm in Python based on the original paper with testing on MovieLens-100K for recommendations. We compared RMSE results with those obtained by block coordinate descent.

Work and research experience

Jan. 2019 - Dec. 2019: Data scientist intern at NBCUniversal Media
- Statistical testing and analysis
  - Conducted statistical testing to measure differences in viewer per viewing household age group distributions over time, under both independence and autocorrelation assumptions
  - Developed Retool application to visualize statistical trends and simulate reallocation across the distribution, with presentation to Corporate Decision Sciences senior leadership
- Data engineering and transformation
  - Prototyped PySpark methods to aggregate and transform Nielsen base data with 1M+ rows for modeling
  - Methods were adopted and put into production pipeline by data engineering team
- Developing forecasting models
  - Fine-tuned linear regression models for forecasting NBC viewer demographic distributions on a single-show level
  - Produced 18-month forecasts using Python machine learning methods (ARIMA, linear regression, random forest) for shortform digital content including YouTube, Hulu, etc.
- Evaluating forecasting models
  - Developed processes using MLFlow and R to save 800+ forecast models, evaluate based on MAPE, APE, visualization, anomaly detection, etc., and update MySQL tables for the models
  - Designed interactive Retool interface for senior data scientists to track model performance, which I folded into the production process
Sept. 2017 - Aug. 2018: Undergraduate research assistant studying data on homelessness in Los Angeles with UCLA Mathematics Department
- Extracted and cleaned data from Google Maps Geocoding API and public datasets (American Community Survey, LA city data from DataLA)
- Visualized and analyzed Los Angeles city census tracts with principal component analysis, nonnegative matrix factorization, geographic mapping, and correlation analysis
- Implemented and evaluated machine learning models (linear regression, logistic regression) for predicting changes in homeless populations in Python and Tensorflow
- Experimented with simulations and single-layer artificial neural networks in MATLAB and Tensorflow
- Communicated results to UCLA faculty and students through multiple presentations and reports
Sept. 2017 - Nov. 2017: Customer data analytics intern at GuidedChoice
- Produced insightful visualizations and demographic breakdowns of state of Florida account for retirement planning financial services firm
June 2015 - Dec. 2017: Research assistant for Swartz Center for Computational Neuroscience at University of California, San Diego
- Participated in design and proposal of experiment on how competition modulates neural correlates of insight
- Coded Python scripts to collect subject response times and psychological survey data
- Developed MATLAB code for preprocessing subject electroencephalogram (EEG) data using EEGLAB software and independent component analysis
- Supported grant applications with detailed plots and statistical trend information

Journalism and leadership experience

June 2016 - June 2017: Copy chief at The Daily Bruin
- Led weekly meetings with eight slot editors and monthly training sessions with 30 contributors
- Responsible for ensuring all content for The Daily Bruin (~9,000 copies/day circulation) and DailyBruin.com (~ 400,000 views/month) is edited for clarity, accuracy, and style, and has concise and engaging headlines
- Developed detailed copy-editing schedules for nearly 150 days of production, breaking, and longform journalistic content
- Contributed to comprehensive Daily Bruin style guide for grammar, style, and sensitivity
- Maintained all the same duties as a slot editor
March 2015 - June 2016: Slot editor at The Daily Bruin
- Edited all content for one section of the paper two days per week
- Participated in regular meetings on style and terminology related to racial, gender, and cultural sensitivity

Derek B. Yen

Resume

Education

Skills and qualifications

Selected projects

Work and research experience

Journalism and leadership experience