Resume
You can get a PDF version of my resume here.
Education
- New York University – May 2020
- Master of Science in Data Science (GPA: 3.96/4.00)
- Selected coursework: Deep Learning, Natural Language Processing, Probabilistic Time Series Modeling, Natural Language Understanding and Computational Semantics, Text as Data, Responsible Data Science, Machine Learning, Big Data, Optimization and Computational Linear Algebra
- University of California, Los Angeles – June 2018
- Bachelor of Science in Mathematics of Computation with Minor in Neuroscience
- Selected coursework: Information and Power, American Television History, Algorithms in Bioinformatics, Mathematical Modeling
Skills and qualifications
| Programming languages | Python, R, MATLAB, C++, HTML, JavaScript |
| Python library proficiencies | PyTorch, Tensorflow, Scikit-learn, Pandas, Matplotlib, NumPy, SciPy |
| Big data tools | SQL, PySpark, MapReduce |
| Design and typesetting | LaTeX, Markdown, WordPress, Adobe InDesign |
Selected projects
- Capstone: Applying a General Language Model to Medical Text – Applied multilabel text classification strategies to dataset of 6M+ NYU Langone Health medical notes using deep natural language representation models in PyTorch. We experimented with benefits of fine-tuning in a transfer learning context for transformer-based models such as BERT and XLNet.
- Data Science: Text Classification with News Articles – Compared sentiment analysis methods, including Naive Bayes, bag-of-words logistic regression, BOW support vector machines (SVM), and neural networks, for dataset of articles about technology companies from The New York Times.
- Big Data: Recommender system with PySpark – Developed recommendation model using alternating least squares (ALS) for implicit feedback in PySpark on Last.fm dataset. We experimented with cold start problem using a latent feature regression model for unknown items.
- Natural Language Understanding: Building a Semantic Parser to Handle Queries about Song Data – Built a semantic parser using the SEMPRE framework which can answer queries about a subset of the Million Song Dataset.
- Numerical Linear Algebra: Implementing an Efficient Matrix Completion Algorithm in Python – Implemented SoftImpute-ALS algorithm in Python based on the original paper with testing on MovieLens-100K for recommendations. We compared RMSE results with those obtained by block coordinate descent.
Work and research experience
- Jan. 2019 - Dec. 2019: Data scientist intern at NBCUniversal Media
- Statistical testing and analysis
- Conducted statistical testing to measure differences in viewer per viewing household age group distributions over time, under both independence and autocorrelation assumptions
- Developed Retool application to visualize statistical trends and simulate reallocation across the distribution, with presentation to Corporate Decision Sciences senior leadership
- Data engineering and transformation
- Prototyped PySpark methods to aggregate and transform Nielsen base data with 1M+ rows for modeling
- Methods were adopted and put into production pipeline by data engineering team
- Developing forecasting models
- Fine-tuned linear regression models for forecasting NBC viewer demographic distributions on a single-show level
- Produced 18-month forecasts using Python machine learning methods (ARIMA, linear regression, random forest) for shortform digital content including YouTube, Hulu, etc.
- Evaluating forecasting models
- Developed processes using MLFlow and R to save 800+ forecast models, evaluate based on MAPE, APE, visualization, anomaly detection, etc., and update MySQL tables for the models
- Designed interactive Retool interface for senior data scientists to track model performance, which I folded into the production process
- Statistical testing and analysis
- Sept. 2017 - Aug. 2018: Undergraduate research assistant studying data on homelessness in Los Angeles with UCLA Mathematics Department
- Extracted and cleaned data from Google Maps Geocoding API and public datasets (American Community Survey, LA city data from DataLA)
- Visualized and analyzed Los Angeles city census tracts with principal component analysis, nonnegative matrix factorization, geographic mapping, and correlation analysis
- Implemented and evaluated machine learning models (linear regression, logistic regression) for predicting changes in homeless populations in Python and Tensorflow
- Experimented with simulations and single-layer artificial neural networks in MATLAB and Tensorflow
- Communicated results to UCLA faculty and students through multiple presentations and reports
- Sept. 2017 - Nov. 2017: Customer data analytics intern at GuidedChoice
- Produced insightful visualizations and demographic breakdowns of state of Florida account for retirement planning financial services firm
- June 2015 - Dec. 2017: Research assistant for Swartz Center for Computational Neuroscience at University of California, San Diego
- Participated in design and proposal of experiment on how competition modulates neural correlates of insight
- Coded Python scripts to collect subject response times and psychological survey data
- Developed MATLAB code for preprocessing subject electroencephalogram (EEG) data using EEGLAB software and independent component analysis
- Supported grant applications with detailed plots and statistical trend information
Journalism and leadership experience
- June 2016 - June 2017: Copy chief at The Daily Bruin
- Led weekly meetings with eight slot editors and monthly training sessions with 30 contributors
- Responsible for ensuring all content for The Daily Bruin (~9,000 copies/day circulation) and DailyBruin.com (~ 400,000 views/month) is edited for clarity, accuracy, and style, and has concise and engaging headlines
- Developed detailed copy-editing schedules for nearly 150 days of production, breaking, and longform journalistic content
- Contributed to comprehensive Daily Bruin style guide for grammar, style, and sensitivity
- Maintained all the same duties as a slot editor
- March 2015 - June 2016: Slot editor at The Daily Bruin
- Edited all content for one section of the paper two days per week
- Participated in regular meetings on style and terminology related to racial, gender, and cultural sensitivity
