-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathraw-resume.html
171 lines (166 loc) · 15 KB
/
raw-resume.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
---
layout: default
title: Raw Resume
---
<div id="raw-resume-content">
<div>
<h2 class="children-center">{{site.name}}</h2>
</div>
<div>
<div class="children-center">
<div class="inline-block">{{site.phone}}</div> |
<div class="inline-block"><a>{{site.email}}</a></div> |
<div class="inline-block">{{site.location}}</div> |
<div class="inline-block"><a href="{{site.linkedin}}"><b>LinkedIn</b></a></div> |
<div class="inline-block"><a href="{{site.github}}"><b>Github</b></a></div> |
<div class="inline-block"><a href="{{site.website}}"><b>Technical</b></a></div>
</div>
</div>
<hr>
<div style="width: 100%;">
<h5>Technical Skills</h5>
<div style="width: 50%;float: left;">
<div>- <b>Code: </b>Python, bash, SQL, Markdown, powershell, Java, C/C++</div>
<div>- <b>CI/CD: </b>Github Actions, Azure DevOps, Jenkins, ArgoCD</div>
<div>- <b>Cloud/IaC: </b>AWS, Azure, Terraform, Packer, Ansible</div>
</div>
<div style="width: 50%;margin-left: 50%;">
<div>- <b>Config: </b>cloud-init, SaltStack</div>
<div>- <b>Container: </b>Docker, Podman, Kubernetes(kOps, kubectl, helm)</div>
<div>- <b>Observability: </b>OpenTelemetry, CloudWatch, Datadog, Grafana</div>
</div>
</div>
<hr>
<div>
<h5>Experience</h5>
<div class="space-between bold-text">
<div>DevOps Engineer</div>
<div>July 2021 - Present</div>
</div>
<div class="space-between">
<div><b>DataJoint</b> - <span class="italic-text">Provide science operation for neuroscience research.</span></div>
<div>Houston, TX</div>
</div>
<div class="exp-content">
<ul>
<li><b>DataJoint Works</b> - <span class="italic-text">A SaaS platform to empower scientists to design and operate data pipelines for their experiments and analysis in a more efficient, scalable, valid and reproducible way.</span> [<a href="https://datajoint.com/works">Details</a>]</li>
<ul>
<li>Administrated DataJoint's and several other customers' <b>AWS</b> account with <b>Infrastructure as Code</b> tools like <b>Terraform</b>.</li>
<li>Configured VPC, Subnet, IAM, S3, EFS, EC2, RDS, Lambda, SQS, SNS, SES, CloudWatch, Route 53 and Secrets Manager.</li>
<li>Provisioned and maintained Production and QA <b>Kubernetes</b> clusters with <b>kOps</b> and <b>helm</b>.</li>
<li>Architected and implemented ephemeral computational cluster with <b>Packer</b> and <b>Terraform</b> to support <b>CPU</b> and <b>GPU</b> usage.</li>
<li>Developed <b>CI/CD</b> pipelines with <b>Github Actions</b> starter and reusable workflows to automate build, test and deployment.</li>
<li>Integrated <b>Jupyter Hub</b> and kernel gateway as part of the <b>internal developer portal</b>.</li>
<li>Implemented customer onboarding API with <b>Flask</b>, <b>SQL</b>, <b>bash</b>, <b>boto3</b> and <b>Terraform</b> to automate infrastructure provisioning.</li>
<li>Introduced <b>OpenTelemetry</b> to the team and integrated observability with <b>CloudWatch</b>, <b>Datadog</b> and <b>LGTM stacks(Grafana)</b>.</li>
<li>Implemented <b>single single-on</b>, <b>role-based access control</b>, <b>secret manager</b>, <b>vulnerability scan</b> for security compliance.</li>
<li>Collaborated with the team in <b>Agile</b> approach using <b>Jira</b> and <b>Confluence</b>, also used <b>Github Project</b> for open-source projects.</li>
</ul>
<li><b>DataJoint Core/Elements</b> - DataJoint Core is an open-source toolkit for defining and operating computational data pipelines. DataJoint Elements is a collection of pre-assembled modules for neuroscience pipelines. [<a href="https://github.com/datajoint">Github</a>]</li>
<ul>
<li>Collaborated with internal scientists to standardize the support of <b>Matlab</b> and <b>GPU</b> for several workflows.</li>
<li>Implemented <b>dev container</b> for <b>open-source</b> repositories to allow any collaborators to work on Github Workspace.</li>
<li>Integrated <b>mkdocs</b> to improve documentation development efficiency and reader experience.</li>
</ul>
<!-- style="list-style-type: none;" -->
<li class="italic-text">DataJoint Works, Core and Elements improve research efficiency of 10+ neuroscience labs as of this moment. My contribution technically improves DataJoint Works' robustness, flexibility and scalability, also automated manual toil through internal and external collaboration to improve the productivity in both commercial and open-source development. </li>
</ul>
</div>
<!-- <div class="exp-content">
<div>
<b class="italic-text">* AWS:</b> Administrated DataJoint's AWS account and several other customers' AWS accounts. Configured <b>VPC</b>, <b>Subnet</b>, <b>Security Groups</b>, <b>IAM</b> role and policies, <b>S3</b> lifecycle management, <b>EFS</b> access point, <b>EC2</b> instances, <b>RDS</b> instances, <b>Lambda</b> triggered by <b>SQS</b> or <b>EventBridge</b>, <b>SNS</b> and <b>SES</b>, <b>CloudWatch</b> metrics and alarms, <b>Route 53</b> DNS records, <b>Secrets Manager</b> for deployment secrets.
</div>
<div>
<b class="italic-text">* CI/CD: </b> Developed generic <b>Github Actions</b> reusable workflows used by <b>30+</b> repositories followed by <a href="https://www.conventionalcommits.org/en/v1.0.0/">Conventional Commits</a>, <a href="https://learn.microsoft.com/en-us/devops/develop/how-microsoft-develops-devops">Release Flow</a> and <a href="https://opengitops.dev/">GitOps</a> best practices, to automate build, test, release, publish private or open-source <b>Python</b> packages[<a href="https://pypi.org/search/?q=datajoint">PyPI</a>] or deploy <b>Docker</b> images[<a href="https://hub.docker.com/u/datajoint">Dockerhub</a>].
</div>
<div>
<b class="italic-text">* Kubernetes: </b> Provisioned Kubernetes clusters hosted on EC2 instances for development, staging and production environments using <b>k3d or kOps</b>. Developed utility <b>bash</b> scripts with <b>helm</b> and <b>kubectl</b> to manage Kubernetes clusters more efficiently, including configuring <b>Nginx ingress</b> controller, cert manager with <b>Let's encrypt</b> issuer, <b>Cillium</b> Container Network Interface(CNI), IAM Roles for Service Account(<b>IRSA</b>), <b>Cluster Autoscaler</b>, AWS Elastic Load Balancer(<b>ELB</b>) or deploying applications like Percona XtraDB Clusters, Keycloak, JupyterHub, Flask and ReactJS based web application, etc.
</div>
<div>
<b class="italic-text">* Ephemeral Worker Clusters: </b> Designed and developed a worker lifecycle manager using Python within one month to fulfill an <b>urgent</b> business requirement. This development <b>polls</b> jobs from a MySQL database, then provisions and configures ephemeral EC2 instances by <b>Packer(pre-build AMI), Terraform and cloud-init</b> to compute jobs <b>at scale</b>; implemented AWS S3 mount to significantly reduce raw data downloading <b>overhead</b> and added EFS as a file cache for intermediate steps to improve computation <b>failover</b>; configured <b>NVIDIA CUDA toolkit</b> and <b>NVIDIA container runtime</b> for <b>GPU</b> workers.
</div>
<div>
<b class="italic-text">* Platform Automation: </b> To provision or terminate AWS resources using <b>boto3</b> or <b>Terraform</b>; manage customers' <b>RBAC</b> permissions using Keycloak and Github REST API; generating usage and billing report with <b>AWS S3 Inventory</b> report, <b>AWS CloudTrail</b> and <b>AWS Cost and Usage</b> report, made a <b>Plotly Dash</b> to analyze cost and usage efficiency.
</div>
<div>
<b class="italic-text">* Jupyterhub: </b> Configured and maintained Jupyterhub deployment on a Kubernetes cluster with <b>Node Affinity</b> to assign pods onto different nodes by requirements and <b>Cluster Autoscaler</b> along with <b>AWS Auto Scaling Group</b> to accommodate <b>100+</b> active users; improved base images' <b>build time</b> and maintenance <b>overhead</b>.
</div>
<div>
<b class="italic-text">* Observability:</b> Implemented a small part of the metrics and alerts using <b>AWS CloudWatch</b>, and then later integrated <b>Datadog</b> for Kubernetes clusters' and ephemeral EC2 instances' metrics and logging through <b>OpenTelemetry</b> protocol, synthetic API testing, and UI/UX monitoring.
</div>
<div>
<b class="italic-text">* Security:</b> Set up codebase <b>vulnerability</b> scan with FOSSA; Set up <b>AWS Secrets Manager</b> working with <b>External Secret Store Operator</b> to secure Kubernetes secrets; Deployed and administrated self-hosted <b>Keycloak</b> for <b>RABC</b> authentication, further integrated it with <b>AWS IAM</b> as an <b>identity provider</b> to access AWS resources through <b>STS</b>, enabled OpenID Connect(<b>OIDC</b>) authentication flows such as authorization code flow, client credential flow, password grant flow etc.
</div>
<div>
<b class="italic-text">* MySQL Database:</b> Maintained a self-hosted <b>Percona XtraDB Clusters</b> on database <b>daily backup</b> stored on <b>S3</b>, <b>mysqldump</b> backup redundancy, Point-in-Time Recovery(<b>PITR</b>), <b>deadlock</b> detection, and slow query log.
</div>
</div> -->
<!-- <br> -->
<div class="space-between bold-text">
<div>Software Engineer(MLOps)</div>
<div>May 2019 - July 2021</div>
</div>
<div class="space-between">
<div><b>dataVediK</b> - <span class="italic-text">Optimize oil and gas operations by machine learning.</span></div>
<div>Houston, TX</div>
</div>
<div class="exp-content">
<ul>
<li><b>Hyper-converged Data Analysis Platform</b> - <span class="italic-text">An SaaS platform integrated data management, machine learning and data analytic services for oil and gas.</span> [<a href="https://www.agoraiot.com/marketplace/drillvedik">DrillVedik</a>]</li>
<ul>
<li>Implemented <b>CI/CD</b> pipelines with <b>Azure DevOps</b> and <b>Jenkins</b> for build, test, validation and deployment.</li>
<li>Integrated <b>MLflow</b> as machine learning operational pipeline to improve model comparison, versioning and serving.</li>
<li>Set up <b>Airflow</b> to automate data processing pipeline.</li>
<li>Developed DrillVedik interactive drilling analytic dashboard with <b>Plotly Dash</b>, <b>Flask</b> and <b>Redis</b>.</li>
<li>Architected and developed full stack of the prediction task manager web application with <b>HTML</b>, <b>CSS</b>, <b>JavaScript</b>, <b>Flask</b>, <b>Celery</b>, <b>RabbitMQ</b>, <b>gunicorn</b>, <b>nginx</b>.</li>
<li>Analyzed drilling pump operation data and trained multiple machine learning models to <b>classify</b> drilling status.</li>
<li>Researched and applied feature engineering on drilling sensor data, trained a <b>regression</b> model for drilling speed prediction.</li>
</ul>
<li class="italic-text">Although this was a MVP project, I have learned and practiced varieties of hands-on skills from software development and deployment, machine learning to cloud computing. Also inspired me about the importance of DevOps through the collaboration.</li>
</ul>
</div>
<!-- <div class="exp-content">
<div>
<b class="italic-text">* Interactive Drilling Dashboard: </b>This is an <b>enterprise</b> product that I worked with two more engineers. Developed a <b>Plotly Dash</b> dashboard that visualizes processed data using Bootstrap, CSS media query, <b>Redis</b> and sqlalchemy. Also, implemented a <b>socket</b> service will notify when <b>Airflow</b> pipeline finished processing in order to <b>synchronize</b>(refresh) the dashboard's data.
</div>
<div>
<b class="italic-text">* CI/CD Pipeline: </b>Set up several <b>Azure Pipelines</b> for continuous development, testing and continuous deployment in <b>dev, test and prod</b> stages. Additionally, made a <b>Jenkins</b> pipeline to work with on-premise infrastructures.
</div>
<div>
<b class="italic-text">* ML Pipeline: </b>Set up a <b>MLflow</b> server for machine learning experiment logging, parameter tuning, continuous training, model management and model serving.
</div>
<div>
<b class="italic-text">* ETL Pipeline: </b>Working with a data engineer, set up an <b>Airflow</b> server for our data ETL pipeline.
</div>
<div>
<b class="italic-text">* Prediction Task Manager: </b>Working with a front-end developer, designed and developed a <b>production</b> web application that supports job queuing and parallel processing for drilling speed prediction using JavaScript, <b>flask</b>, sqlalchemy, <b>celery</b>, RabbitMQ, gunicorn, Nginx, supervisord, Docker and AWS EC2, AWS Cognito Authentication, HTTPS
</div>
<div>
<b class="italic-text">* Drilling Status Detection: </b>Working with a domain expert, developed two <b>classification</b> models for detecting drilling status using Logistic Regression and Random Forest with the convenience of the MLflow server
</div>
<div>
<b class="italic-text">* Drilling Speed Prediction: </b>Working with a domain expert, applied Gaussian Process <b>Regression</b> for feature synthesis based on geographical information as well as <b>feature engineering</b> based on correlation matrix and F1 score ranking, built a non-linear regression model using LSTM RNN.
</div>
</div> -->
</div>
<hr>
<div>
<h5>Education</h5>
<div class="space-between">
<div>Southern Methodist University, <i>Master's of Computer Science</i></div>
<div>Dallas, TX | Aug 2017 - May 2019</div>
</div>
<div class="space-between italic-text">
<div></div>
<div></div>
</div>
<div class="space-between">
<div>Qingdao University, <i>Bachelor's of Software Engineering</i></div>
<div>Qingdao, China | Aug 2013 - May 2017</div>
</div>
<div class="space-between italic-text">
<div></div>
<div></div>
</div>
</div>
</div>