Winter Quarter Project Group - Data Science at UCSB

Contributors: Raul Eulogio, David A. Campos, Jason Freeberg, Nathan Fritter

In Memory of..

The efforts of this quarter and the work done is dedicated to the memory of:

Fernando Regino (1993-2013)
Bernardino De Jesus (1993-2016)
Ivan Garcia Vergara (1991-2018)
Erik Alonso (1991-2009)
Jorge Zarate (1990-2008)

"When the lights shut off
And it's my turn to settle down
My main concern
Promise that you will sing about me" - Kendrick Lamar

Thank you to everyone who participated this quarter

Abstract

This repository serves as an itinerary for the Project Groups for Winter Quarter for the Data Science at UCSB organization. Providing a weekly overview as well as resources used within the weekly meetings.

Contributors:

Raul Eulogio -> rauleulogio3 [at] gmail.com
- GitHub: https://github.com/raviolli77/
David Campos - dcampos.liz [at] gmail.com
- GitHub: https://github.com/dcamposliz
- Personal Site: http://davidacampos.com/
Jason Freeberg -> freeberg [at] umail.ucsb.edu
- GitHub: https://github.com/JasonFreeberg
- Personal Site: JasonFreeberg.github.io
Nathan Fritter -> nathan.fritter [at] gmail.com
- GitHub: https://github.com/Njfritter

Lesson Plan

Week 2: Introductions

Who are you?
- Name
- Major
- Year
- Where are you from?
Why are you here?
- What are you trying to accomplish in life?
- what are you trying to accomplish here?
- What are you trying to learn?
- What project(s) are you working on today?
- What recent failure have you had?
- Strengths & weaknesses as it relates to data science or in general? Storm Goal of this group is to ultimately get projects finished and published
WHY
- We found that it is by working on projects that you actually get to learn and being to understand how to do data science
Brainstorm on data science ideas
- Write them on a piece of paper
- Go to the front of the group and present it
- Have people walk up to you/you walk up to people, persuade people to be in your group

Collide:

Form teams
Mix up grade levels/experience
Discuss weaknesses, technologies, expertise, talent
- Pick R or Python
Establish Communication channels
- Facebook
- GroupMe
- Slack
- GitHub
- Phone
- Gmail/Email

Homework:

Find an interesting project online/from inertia7.com
- Read through contents
Catch up on your R/Python skills with DataCamp
Get to know each other
Become Familiar with GitHub/create account (for more beginner level/those who weren't here, we'll go into more detail in a later meeting)

Links to Resources to resources discussed in meeting:

Week 3: Why do a Data Science Project?

Some preliminaries

Does everyone in your team have:
- Slack account/channel within the dsprojectgroup Slack?
- GitHub account?
- R, Python, SQL set up on their machine? (Whatever y'all plan on using)
  - Speak about versions for language and packages/modules. Especially in Python:
    - Speak to me after if you need more clarification +If you can answer this questions then you're fine: Do you know what a virtual environment is? And do you know its use?
      - If you don't know have your team speak to me after.
  - Which interface will your team be using i.e. Rstudio or Jupyter Notebook for R
Introduce the concepts of Stand Ups
- Structure of an effective Stand Up:
  - What did I accomplish last meeting?
  - What will I do today?
  - What obstacles are impeding my progress? (Blockers)
Document everything in your Slack channel
- If you used a site to review R, Python, html, etc. post it within your group's channel
- Read a cool article relating to your project; document it on Slack
- This will become important when citing sources, creating documentation for project, and just a good habit to develop since people deserve credit for helping you!
Trello
- Nathan will introduce the interface and how to integrate it into your workforce
- We might create a markdown file explaining in more detail if people do not understand how to use it right away (but is pretty easy to use).
- Resources:
  - Trello Tutorial
  - Trello Youtube Tutorial

What is a Data Science Project?

How to do a Data Science Project?
- Steps of a Data Science project:
  - Getting Data
    - UCI Machine Learning Repository
    - Kaggle datasets
  - Cleaning data/sanity checks
  - Exploratory Analysis
    - Trends in reponse and predictor variales
  - Modeling (Choosing Supervised Vs. Unsupervised Learning)
  - Model Validation
  - Sharing Results
    - Inertia7.com
    - GitHub repo with nice READNE.md
    - Jupyter/RMarkdown Notebook

If you don't think you can do a project on your own right of the bat. Try doing a project from Inertia7!

Here are some of my own repos where I have projects that aren't published on Inertia7:

Discuss what their project can look like given the structure of what they just hacked

Fill in the Steps of a Data Science Project

Homework: For this section, we can be lenient as to when this gets done. For more advanced groups we expect for you to be able to do this on your own. Now for the newer groups you can wait until the next meeting to have me or other members help with the process.

Build a proposal for your own project
- Get comfortable using Markdown notation
Create a repo in the Data Science Project Groups GitHub Account including these steps:
- Abstracts
- Finish filling the Steps of a Data Science Project
- Data Sources? Examples include, but are not limited to:
  - Kaggle
  - UCI
  - Data sets found in R
  - Quandl
  - API calls:
    - Wikipedia
    - Twitter
    - Google Maps
    - Saint Louis Federal Reserve
    - Google Analytics
- If not, then select a project from the suggested list or talk to me for project ideas Links to Resources to resources discussed in meeting:
R/RStudio: https://www.rstudio.com/
Python: https://www.python.org/
Inertia7: http://www.inertia7.com/
GitHub: https://github.com/raviolli77
Trello: https://trello.com/
UCI ML Database: https://archive.ics.uci.edu/ml/datasets.html
Kaggle Datasets: https://www.kaggle.com/datasets
R Data sets: http://stat.ethz.ch/R-manual/R-devel/library/datasets/html/00Index.html
Quandl: https://www.quandl.com/
Wikipedia API: https://www.mediawiki.org/wiki/API:Main_page
Twitter API: https://dev.twitter.com/docs
Saint Louis Federal Reserve: https://fred.stlouisfed.org/
Google Analytics: https://www.google.com/analytics/#?modal_active=none
Jupyter Notebook: http://jupyter.org/
R Markdown: http://rmarkdown.rstudio.com/

Week 4: Project Iteration/GitHub

Some Preliminaries:

Are people interested in a Python Hackathon?
- If so when and where works best
Has your team created a GitHub Repo for your project within the organizational GitHub (Source: https://github.com/UCSB-dataScience-ProjectGroup)?
- Does it have a ReadMe explaining the Steps of a Data Science Project?
- Did you all agree which versions/interface for the language you will be using?
- Did you reach a conclusion of what models/approach you will take?
  - If not give us an overview what you plan to do, by the end of this meeting the project should be decided more or less

Team Resources

Has your team...
- Been in contact through Slack?
- Been doing Stand Ups?
- Been addressing issues in going about your project or any preliminary practice for your project
- Asked for help?

GitHub Crash Course

Here we're giving a quick overview of how GitHub works. Purpose is to be used as a rudimentary guide for those of you who are new to GitHub. We can spend an entire day going over the workflow of GitHub, but for now we're concerned with just getting your feet wet, and soon creating a repo for your project if you haven't already.

NOTE: One can spend an entire day learning git, so we'll leave that out for this iteration. We will provide resources for git below!

Step 1:
- Create a GitHub account (Should go without saying, but you'd be surprised.)
Step 2:
- You should create a myProject folder where you keep all your projects. This will help with organization for later on when you'll be doing a shit load of projects and prior when publishing projects!
- Create a folder for your project where you will include things like, but not limited to:
  - README file - This file will be other people's introduction to your project so make it pretty and easy to follow! (in .md format). I use Sublime Text to create and edit README files (there's a plethora of text editors like Notepad++, atom, etc. really its all personal preference)
  - Script files - These files will be in the format of the language you are doing your project on so either an R file or Python file (in .R or .py or .sql )
  - Data file(Not sure what the proper name for this is will edit later) - This file is where your data is stored if you are using a static data source typically it can be:
    - .csv file
    - .txt file
    - .JSON file
    - .db file
  - Image folder - For organizational purposes we usually create an image folder which is where we store all images produced in the project if we plan on hosting them or making them viewable without having to run/save the code. Inside this folder you will find static image files like:
    - .png files (favorited in producing statistical images)
    - .jpeg
    - .gif
  - Once you get more acquainted with GitHub there will be more files that you will add, but for this example these will do
Step 3:
- Once you have the folder for your project and all the respective files you wish to include in the repo on the main page of GitHub, click the green button that says New repository
- Add the Repo name: we usually name our repos as such
  - statisticalModel_DataSetDescription Ex.
    - classification_IrisFlowersR
    - regression_bostonHousingR
- Add a description: give a brief overview of what your project will be about to help give people context. Ex.
  - A collection of alternate R markdown templates
  - Repo for a quick ggplot2 tutorial for Exploratory Analysis using Jupyter Notebook and R script
- Leave it as public: Make it accessible to everyone
- Initialize with a README - ALWAYS initialize with a README: this acts as an instructional overview for your project
  - You typically include steps that were required that you can't express in your code (i.e. Creating a plotly account, steps needed if there are multiple scripts in your project)
  - A brief overview of your data set and statistical models used in the project
    - This will help later on if you plan to publish on inertia7!
  - Updates made to your project since its last iteration
  - Look at the inertia7 README's for some concrete examples
Step 4: Since you will be working in a team you have to be familiar with branches. Branches are different versions for the project, so a good way for your group to work on the project without fucking up the master branch
(Master Branch: This is the version the world will see and use, so make sure that this branch is the best iteration/is deployable)
- Create a branch and call it like ravi_branch
- You and each person in your team should have a branch that shows your iteration of the project if you happen to go ahead or test something out you haven't spoken with your teammates yet.
Step 5: Say you and your group are in agreement that your branch is the version you want on the master branch, the next step is creating a Pull Request.
(Pull Request: Allows people to review any changes made in a project, make modifications before the master branch changes, and overall help a team work efficiently)
- Go into the branch you want to merge so ravi_branch
- Click New Pull Request
  - Here you will see the two branches being compared:the base will typically be the master branch and the compared file will be ravi_branch in our example.
  - Add a description of some of the changes you made!
  - GitHub will give you an overview of the changes made in files
  - Once you have reviewed everything click Create pull request
  - This is where other teammates will be notified of you wanting to merge your branch and the master branch
  - If everyone is in agreement you click Merge pull request
  - Then, click Confirm merge and the master branch will now have the same contents as ravi_branch

That's a quick and rough tutorial to working in GitHub. Doesn't go over everything but should give context as to how to work as a team using GitHub and branches. I have provided sources that go in more detail and definitely explain better so I would suggest reading up on them!

Homework:

Will depend on conversations we have on Wednesday to see where your team is at
Have a repo within the organizational repo by the end of today!
Create branches for each teammate
Set up a meeting time outside of Wednesday

Links to Resources to resources discussed in meeting(NOTE(2/14): Moved GitHub related resources to Recommended Resources for entire quarter):

Week 5: Project Iteration

Some Preliminaries:

Python Hackathon (Workshop)
- Steps needed to be taken before we can start/set up the hackathon:
  - Install Python3.X
  - Use a Virtual Environment for your project if it will be in Python
- Fill out the google survey sent yesterday night:
  - We need to gauge date, time, and funds to make sure it will run smoothly
Rewards!!!
- HG Data Hackathon
  - Date proposition: April 21st from 2pm to 10pm
    - Most likely broken into 5-6 teams and pair an HG Data Engineer with the respect teams
- Spoke with Jason
  - Informal presentation of projects with congratulatory refreshments
    - Reward for Best Data Visualization
    - Reward for Best insight/best modeling
    - Reward for Best presentation
- Jun Seo can speak of presentation of projects for library staff!
Major issues to address for today:
- Does every team have a requirement.txt for their project?
- Some README's need more detail (I will go about doing informal interviews today to each group)
- By today your team should have what algorithms, methods and Python versioning.
- Branches for team members Depending on attendance we want today really show us the early iteration of your project so
Have a script with modules you will be using
Data set attached to your repo
Algorithms you will use

Week 6: Project Iteration/Blockers

Some Preliminaries:

Python Hackathon (Workshop)
- Confirmed Date: 2/25/2017 at 10 a.m.
- Buy shirts to rep!
  - Contact me after to get them from other officer. I can take Venmo!
Rewards (Reiterate because a lot of people were MIA)!!!
- HG Data Hackathon
  - Date proposition: April 21st from 2pm to 10pm
    - Most likely broken into 5-6 teams and pair an HG Data Engineer with the respect teams
- Informal presentation of projects with congratulatory refreshments near end of this quarter
  - Reward for Best Data Visualization
  - Reward for Best insight/best modeling
  - Reward for Best presentation
- The informal presentation can be a prep for the presentation to the Library faculty
  - Most likely scheduled at the start of next quarter (Ask Jun-Seo if you have any questions)
- Project will be posted in the newest iteration of int7x (inertia7)!
Team Management
- Word from me regarding team
- We need teams to start applying Stand Ups now (Mandatory)
  - Must be done before starting your sessions and immediately when your team finishes the meet-up.
  - Will demonstrate again with more feedback given to teams Today will play as an important catch up day for many teams since midterm season was(is) around
I will go around to teams and ask about project relating to
- repository
- code
- README Today will be focused mostly on iterating projects.

Week 7: I didn't prep this week

Carry on. Nothing to see here.

Week 8: Presentation/Flex Day

For this week I decided we are going to do a surprise project presentation.

Announcements: Thank you for everyone who participated in the Python Workshop

I will need every team to do the following:

Update all scripts on their GitHub repo in the ProjectGroupWinter2017.
- README.md
- scipt.py
- All appropriate data files (i.e. csv files, txt files, etc.)
- Images (inside images folder) that were produced for this project
Be prepared to pitch your idea to me.
- Sell that shit.
- Why is your project relevant to Data Science and the data community as a whole.
(Not 100%) I would like to see some scripts/notebooks being ran during presentation but due to time constraints, we might just only use what's on GitHub.

Each group presentation should be no longer than 15 minutes

Week 9: Quarter Wrap-Up

Final thoughts on quarter

Thank You's
Dedications
Food for thought for next quarter

Some Preliminaries:

FACTOR PI sale

Only 1$ a piece! Go show some support to our friends at the Female Actuarial Association. Find event link Here

Location: SRB
Date: March 14, 2017
Time: 11AM - 3PM

Farmer's Data Talk

The Org. wants a packed house for the Farmer's Insurance Data Talk so let's all make it out! Facebook event link Here

Location: UCen SB Harbor Room
Date: March 9, 2017 (So tomorrow)
Time: 6PM - 8PM
Will NOT BE FOCUSED on actuary based stuff (Will focus on Natural Language Processing so highly relevant to our group)

HG Data Hackathon

Location: HG Data Offices
Time: April 21st
More on this later
Will most likely work on a tutorial with Calvin during Spring Break to help prep

Chapman Data Fest

Location: Chapman University
Time: April 21st as well
Team of 5 to attend
NOTE: Json wants the people to attend the Chapman Data Fest to be of different class levels (i.e. freshman, sophomore, Junior, Senior and Super Senior)
Let me know if you're interested in this event! Link for Event Here

Library presentations

We have confirmed date!

Location: Same location so here
Time: April 26th at 7pm
Need y'all to use today to prep and keep track of progress!
- Make Github repos pretty
- Code readable
- Write nice docs
- Make plots pretty with titles, axis labels, and legends

Let's really flex for this. Everyone worked hard!

We would like your team to use inertia7 to present your projects so this is a good segue for the next section

inertia7 User Testing

We know dead week and finals are fast approaching but we were wondering if anyone would be interested in User-testing the new iteration of inertia7 to give constructive criticism.

Doesn't have to be publishing a project. Can just play with the app
If interested to talk to me or David
Follow Link to apply for credentials

Wrap-Up

Things needed by the end of this meeting:

Updated Scripts
Updated README's
Add any appropriate images
Create plotly account to publish plotly graphs (if applicable)
To-do list detailing what is still needed for your project
Keep in contact with partners over break.
If you're bored during break work on the project!

IMPORTANT TO NOTE: Since finals is approaching your group needs set this up in their repo since there will be a gap period of 3 weeks. I need to know where your team is at and context of this. You CAN'T leave until your team shows me the repo and the outline of what is done and what isn't done.

Three weeks is a long time and if there's no structure as to where your at you will forget/will be hard to pick back up.

For those of you who feel you are ready to iterate on the presentation part of your project talk to me by the end of today's meeting.

Again thank you for a wonderful quarter and hope to see you all again next quarter!

Recommended Resources for entire quarter:

README Resources:
GitHub Resources:
Git Resources:
- Set Up Git Article
- Create a Repo Article
- Fork A Repo Not discussed in this meeting but important part of GitHub workflow
- Be social (Great place to discover cool shit on GitHub)
- David's Git Repo
Text Editors Resources:
- Sublime Text
- Notepad++
- Atom
- vim
Python Resources:
- Python for Data Analysis (Brush up on NumPy and learn Pandas from the man who created it!)
- Vincent La's Personal Website (Raul's Note: Great place to review/learn Python if you're really rusty)
- Python Documentation (For more advanced users, the documentation for the programming language are clutch resources)
- Learn Python the Hardway (Haven't gone through it will soon, but dank resource for learning Python)
- Yhat (Great resource for machine learning application with Python)
- David's Repo: learnPython
- Hitchhiker's Guide to Python
- Sklearn Docs
- Plotly examples in Python
R Resources:
- R-bloggers (Great place to see people contributing projects and tutorials by real R users)
- ggplot2 docs
  - ggplot2 Cheat Sheet (For visualizations)
- Quick-R
- Plotly examples in R
- R for Data Science (Learn from some of the R greats including Hadley Wickham, creator of many famous R packages)
- An Introduction to Statistical Learning with R (Great book used in many UCSB PSTAT Classes)
Misc.
- Kaggle (Great resource for all things data science)
- DataCamp
- Analytics Vidhya (Lot of great tutorials relating to machine learning)
- Stack Overflow (Stack overflow is love, Stack Overflow is life)
- w3schools tutorials (Great place to learn other important tools like, but not limited too: html, SQL (I used this one a lot), website development)

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
examples		examples
pythonHackathon		pythonHackathon
README.md		README.md
StepsofaDataScienceProject.md		StepsofaDataScienceProject.md
StepstoCreatingDataScienceProject.md		StepstoCreatingDataScienceProject.md
script.py		script.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Winter Quarter Project Group - Data Science at UCSB

Contributors: Raul Eulogio, David A. Campos, Jason Freeberg, Nathan Fritter

In Memory of..

Abstract

Table of Contents

Lesson Plan

Week 2: Introductions

Week 3: Why do a Data Science Project?

What is a Data Science Project?

Week 4: Project Iteration/GitHub

GitHub Crash Course

Week 5: Project Iteration

Week 6: Project Iteration/Blockers

Week 7: I didn't prep this week

Week 8: Presentation/Flex Day

Week 9: Quarter Wrap-Up

Final thoughts on quarter

FACTOR PI sale

Farmer's Data Talk

HG Data Hackathon

Chapman Data Fest

Library presentations

inertia7 User Testing

Wrap-Up

Recommended Resources for entire quarter:

About

Releases

Packages

Languages

raviolli77/dataScience-UCSBProjectGroup-Syllabus

Folders and files

Latest commit

History

Repository files navigation

Winter Quarter Project Group - Data Science at UCSB

Contributors: Raul Eulogio, David A. Campos, Jason Freeberg, Nathan Fritter

In Memory of..

Abstract

Table of Contents

Lesson Plan

Week 2: Introductions

Week 3: Why do a Data Science Project?

What is a Data Science Project?

Week 4: Project Iteration/GitHub

GitHub Crash Course

Week 5: Project Iteration

Week 6: Project Iteration/Blockers

Week 7: I didn't prep this week

Week 8: Presentation/Flex Day

Week 9: Quarter Wrap-Up

Final thoughts on quarter

FACTOR PI sale

Farmer's Data Talk

HG Data Hackathon

Chapman Data Fest

Library presentations

inertia7 User Testing

Wrap-Up

Recommended Resources for entire quarter:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages