Skip to content

Commit

Permalink
Notes on getting started, pull requests process, git diff options, da…
Browse files Browse the repository at this point in the history
…ta viz. Drop Asana; push VSCode. (#4)
  • Loading branch information
jdingel authored Sep 29, 2024
1 parent 4d7325f commit 9bfa059
Show file tree
Hide file tree
Showing 9 changed files with 278 additions and 30 deletions.
142 changes: 131 additions & 11 deletions logbook/entries/codereview.tex
Original file line number Diff line number Diff line change
@@ -1,14 +1,9 @@
BitBucket and GitHub offer a bunch of browser-based tools supporting code review and comments.
Jonathan now prefers this method of providing feedback rather than writing comments in files.
GitHub offers a bunch of browser-based tools supporting code review and comments.
Jonathan much prefers this method of providing feedback over writing comments in files.
One merit is that the history of feedback persists without cluttering up active code.
Another is that threaded conversations are much cleaner than writing back-and-forth within code.

Note: BitBucket offers some experimental features from time to time,
which might change the interface and make the following instructions not applicable.
Please opt out experimental features by going to ``Settings - Labs",
and disable all the ``BETA" features.

\subsection{Default code review process}\label{code_review_process}
\subsection{Code review process}\label{code_review_process}
Given a task, the default process for writing code and submitting it for review is the following:
\begin{enumerate}
\item Create a new branch when you create a new task (\href{https://github.com/gslab-econ/ra-manual/wiki/Tasks}{G\&S}: ``The assignee should create a separate git branch for each task that involves code or output stored in the repository.'')
Expand All @@ -17,21 +12,146 @@ \subsection{Default code review process}\label{code_review_process}
\item Submit a pull request when your code is ready to be reviewed.
\item Team uses the pull request to review code and have a conversation about the task
\item Additional commits are made on the branch in response to discussion and feedback.
\item Finalized task is added to master branch using \href{https://bitbucket.org/blog/git-squash-commits-merging-bitbucket}{a merge that squashes} the branch's commits.
\item Finalized task is added to master branch using \href{https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/incorporating-changes-from-a-pull-request/about-pull-request-merges#squash-and-merge-your-commits}{a merge that squashes} the branch's commits.
\end{enumerate}

A reviewer will typically execute the following steps when reviewing a pull request.

\begin{enumerate}
\item Run \texttt{make} in \texttt{tasks/task\_graph} to create \texttt{task\_flow\_branchdiff.png}
(Note: in order for the graph to be informative, your branch must be up-to-date with main.)
\item Referencing \texttt{task\_flow\_branchdiff.png},
delete all input and output folders in the tasks that were created or
modified on the branch.
\item Run \texttt{make} in the tasks that were created or modified on the branch.
Verify that committed reports and outputs match those generated on the reviewer's machine.
\item Review the code.
\item Compile the paper, slides, and logbook to review them.
\end{enumerate}

The person submitting the request---who
will select themselves as the ``assignee'' on GitHub---should
go through these steps before submitting a pull request
to catch common bugs in advance.
It is ideal if the reviewer can immediately focus on the substance of the code
and not have to spend time notifying the assignee about errors in scripts or \texttt{Makefile}s.

We use GitHub actions to automatically check
that Makefiles are valid when a pull request is submitted.
You can see the status of these checks on the GitHub pull request page.
If any fail,
fix the mistakes and then re-request review to run the checks again.
Note that you can also run the checks manually by executing the shell
scripts in \texttt{tasks/check\_makefiles}.
Similarly, running \texttt{make} in \texttt{tasks/maintenance} flags scripts and \texttt{Makefile}s that
violate best practices or our style preferences.

When reviewing a pull request,
you should check the code for correctness and efficiency.
Consult the logbook entry on writing code (\ref{sec:writing-code}) for our style preferences.

\subsection{Notes on the process}
These notes are for the assignee,
but the reviewer should also be checking for compliance with these guidelines.

\subsubsection{Commit data reports}
We have tools in place to generate reports that summarize the characteristics of datasets.
In Stata, datasets should be saved using the \texttt{save\_data} command,
as it automatically generates these reports.
The command has the additional benefit of enforcing that datasets have unique,
non-missing keys which are specified with the \texttt{key()} option
(see the section dedicated to keys in \href{https://web.stanford.edu/~gentzkow/research/CodeAndData.pdf}{Gentzkow and Shapiro's Code and Data} for why this is a sensible requirement for your datasets).
We have similar tooling for \texttt{R} and \texttt{Python}.

By committing the reports to GitHub, we can track changes to datasets.
This practice allows us to quickly identify the reason for a change in results,
either when run by our collaborators on a different machine or by someone
who is replicating our work.
Ensure that you have committed the reports for all datasets
that you have created or modified before submitting a pull request.

\subsubsection{Avoid large pull requests if possible}
Similar to the rule that more commits are better than fewer commits,
it is better to have modest-sized pull requests at higher frequency
than gargantuan pull requests at lower frequency.
Smaller pull requests are not only more convenient for the reviewer;
they also reduce merge conflicts
and make it easier to track down bugs later.
Both commits and pull requests require brief messages or titles,
which capture the changes that occurred.
The brevity of the messages hints that commits and pull requests are designed
to make a single change and to process the set of changes related to a single task, respectively.

\subsubsection{Wait for approval before merging into main}
Sometimes it makes sense to have multiple reviewers for a pull request.
For example, in a pull request that adjusts content within the paper,
the authors will likely all want to sign off on the changes.
You should wait for the reviewers' consensus approval before merging into main.

\subsubsection{Resolve merge conflicts}
During this process, you may be told that there are merge conflicts between
your branch and main that must be addressed before the pull request can be completed.
This message indicates that your branch and main have made changes to the same files.
You can easily resolve these conflicts from the terminal.
First, run \texttt{git pull} on the main branch to get the latest changes.
Then, switch to your branch and run \texttt{git merge main}.
Git lists the files that contain these conflicts
(see progress with \texttt{git status}), marks the problematic lines within the files,
and will not complete the merge until you manually
select the changes that you want to keep.
See this guide from GitHub for
\href{https://help.github.com/articles/resolving-a-merge-conflict-using-the-command-line/}{resolving merge conflicts using the command line}.
Text editors often have tools to assist in resolving merge conflicts.
See here for VS Code's \href{https://code.visualstudio.com/docs/editor/versioncontrol#_merge-conflicts}{merge conflict resolution interface}.

\subsubsection{Handle untracked files}
\texttt{git status} informs you of any untracked files sitting on your branch.
Untracked files should be committed if you want to include them in the pull request.
Sometimes these files can prevent you from executing a merge
(e.g., the files are artifacts from your work on another branch, and
the merge would overwrite your local copies).
If a file has relevant changes, you should commit it before completing the merge.
If you simply want to take the copy of the file from the branch that you are merging
in---since you did not edit it on your branch, or it is no longer relevant---you
can safely delete it.
You can easily delete untracked files with \texttt{git clean}.
Run \texttt{git clean -n} to see what files will be deleted,
and then run \texttt{git clean -f} to delete them.

\subsubsection{Add packages to setup\_environment}
If you use a package in Stata, R, Python, or Julia,
you should add it to the appropriate \texttt{setup\_environment} file before
submitting a pull request.
This ensures that the code will run on other machines when proper setup
instructions are followed.

\subsection{Commenting on commits}

The screenshots in these examples show BitBucket, a service similar to GitHub.

\subsubsection{Comment on a file}
Jonathan wrote two comments about data visualization on commit \texttt{abf9191}.
Scrolling through the commit webpage, Jonathan clicks on ``comment'' for \texttt{output/figures/summarystats/device\_byzip.eps} and says the following:
\begin{center}\includegraphics[width=.8\textwidth]{./figures/workflow/BitBucket_screenshot_commenting1.png}\end{center}
Ningyin can reply to comment to ask for clarification, and Jonathan can link to this commit comment when assigning tasks.
Ningyin can reply to the comment to ask for clarification, and Jonathan can link to this commit comment when assigning tasks.

\subsubsection{Comment on a line}
You can also write comments on individual lines of a file.
Jonathan spotted a ``force'' option on line 99 of \texttt{code/gen\_census\_data.do} and wrote a comment seeking clarification.
He embeds his code suggestion as part of the comment, complete with BitBucket's language-specific syntax highlighting.
He embeds his code suggestion as part of the comment, complete with language-specific syntax highlighting.
\begin{center}\includegraphics[width=\textwidth]{./figures/workflow/BitBucket_screenshot_commenting2.png}\end{center}

\subsection{Submit a GitHub pull request review}
When making your first comment, select the ``Start a review'' option to initiate a review.
You can track the files that you have reviewed by clicking on the ``Viewed'' button
in the top right corner of the file in the ``Files changed'' tab.
After you have reviewed all files,
click the ``Review changes'' button in the top right corner of the page,
write any summary comments, and select ``Approve'' or ``Request changes''.
All the comments that you made will then be submitted, and the assignee will be notified.
If you are interested in reviewing pull requests outside a browser,
\href{https://cli.github.com/}{GitHub command line} has commands for
reviewing pull requests from the terminal
(and other GitHub operations that are usually done from the browser).
There are also text-editor plugins that allow you to review pull requests,
such as \href{https://marketplace.visualstudio.com/items?itemName=GitHub.vscode-pull-request-github}{GitHub Pull Requests and Issues} for VS Code.
35 changes: 35 additions & 0 deletions logbook/entries/gettingstarted.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
If you just joined our team, welcome.
A few suggestions about getting started:
\begin{itemize}
\item
Be prepared to get stumped and make mistakes.
Ask the coauthors and RAs for help.
We've all been there.
Do not struggle alone in silence.
\item
An existing project is a trove of material demonstrating how we work in practice.
Ask a coauthor to add you to a repository so that you can read through others' pull request reviews.
\item
Some RAs think mastering command-line Git is better than using a GUI app:
``using the command line for Git forces you to know and understand the Git commands you issue.''
\item
An existing project is a trove of material demonstrating how we work in practice.
Read existing Makefiles to learn how Make works.
\item
ChatGPT is very good at both explaining shell scripts and composing new ones based on precise instructions.
ChatGPT is a tool that you should leverage.
You are responsible for your outputs.
\item
Try to apply our computing tools and workflow to research substance that you've already mastered.
Were you an RA? Did you write a thesis?
Take that data and code and get it on GitHub, take your Word document and write it in \LaTeX, write a Makefile to replace your \texttt{master.R}, and so forth.
\item
We hope that one of your first assigned tasks will be ``refactoring'' code (improving code without creating new functionality).
Because the existing code already has valid inputs and outputs,
this assignment will provide a complete description of the pre-requisites and a clear benchmark by which your work will be evaluated.
Look for the ``refactoring/rewriting'' label on GitHub issues.
\item
We don't do \href{https://en.wikipedia.org/wiki/Pair_programming}{pair programming},
but you should aim to work on code live in front of other team members during your early weeks.
They'll notice shortcuts and tools you're failing to deploy as you work.
\end{itemize}
54 changes: 53 additions & 1 deletion logbook/entries/git_alone.tex
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ \subsection{Version control in a single-machine environment}
\item \texttt{git commit} commits a set of changes
\item \texttt{git log} lists the history of commits
\end{itemize}
See ``\href{https://git-scm.com/book/en/v1/Git-Basics}{Git Basics}'' on these.
See ``\href{https://git-scm.com/book/en/v2/Git-Basics-Getting-a-Git-Repository}{Git Basics}'' on these.

A couple more resources:
\begin{itemize}
Expand Down Expand Up @@ -132,3 +132,55 @@ \subsubsection{Undoing changes}
To do that, use \texttt{git reset --soft HEAD\texttt{$\sim$}1}.
This command will remove the last commit from the current branch and keep previously commited files in the staged area.
For a more detailed description, see \href{https://www.atlassian.com/git/tutorials/undoing-changes/git-reset}{Atlassian guide}.

\subsubsection{More notes on \texttt{git diff}}
The options \texttt{--name-only}, \texttt{--name-status}, and \texttt{--compact-summary} are quite helpful.

\texttt{git diff --name-only} shows only which files have changed, not the changed contents.
\begin{lstlisting}[language=bash]
jdingel$ git diff HEAD --name-only
logbook/entries/gettingstarted.tex
logbook/entries/git_alone.tex
logbook/entries/git_together.tex
logbook/entries/researchinfrastructure.tex
logbook/logbook.tex
\end{lstlisting}

\texttt{git diff --name-status} shows whether files are added (A), modified (M), deleted (D), or renamed (R).
This branch has added \texttt{gettingstarted.tex} and renamed \texttt{asana.tex} to \texttt{taskassignments.tex}.
\begin{lstlisting}[language=bash]
jdingel$ git diff origin/master --name-status
M logbook/entries/codereview.tex
A logbook/entries/gettingstarted.tex
M logbook/entries/git_alone.tex
M logbook/entries/git_together.tex
M logbook/entries/researchinfrastructure.tex
M logbook/entries/statanotes.tex
R077 logbook/entries/asana.tex logbook/entries/taskassignments.tex
M logbook/entries/visualization.tex
M logbook/logbook.tex
\end{lstlisting}

\texttt{git diff --compact-summary} shows you how much files have changed.
When I added the first page of this logbook to the \texttt{README.md} file,
85 lines of that file changed, but there were minor edits to four other files too.
\begin{lstlisting}[language=bash]
jdingel$ git log d3bc9642 | head
commit d3bc96426aa929b39281816ec0fbfcbbdac104ee
Author: JDingel <[email protected]>
Date: Wed Mar 22 22:57:26 2023 -0500

Adds the first page of to README.md

commit 4ba5370eaf7b59e0dd29b0bbfefa8910d6c1250c
Author: Jonathan Dingel <[email protected]>
Date: Thu Jun 2 18:57:37 2022 -0500

jdingel$ git diff 4ba5370eaf7b5 d3bc9642 --compact-summary
Makefile (new) | 7 +++++
README.md | 85 +++++++++++++++++++++++++++++++++++++++++++++++++-
header.md (new) | 11 +++++++
logbook/entries/researchinfrastructure.tex | 2 +-
logbook/entries/tasks.tex | 2 +-
5 files changed, 104 insertions(+), 3 deletions(-)
\end{lstlisting}
6 changes: 3 additions & 3 deletions logbook/entries/git_together.tex
Original file line number Diff line number Diff line change
Expand Up @@ -25,13 +25,13 @@ \subsection{Online hosting services and a GUI client}

While you can run Git from the command line,
you might find it a little easier to have a GUI as you learn Git.
I recommend using \href{https://www.sourcetreeapp.com/}{SourceTree},
a free Git GUI for OS X, although it's the only Git GUI I've used.\footnote{
I recommend using the \href{https://code.visualstudio.com/docs/sourcecontrol/overview}{Git tools that are integrated into VSCode}, my preferred text editor.
Other options are
\href{https://github.com/apps/desktop}{GitHub Desktop} and \href{https://www.sourcetreeapp.com/}{SourceTree} (for MacOS).\footnote{
Some of my coauthors have had poor experiences using SourceTree on their Windows machines.
}
SourceTree is by Atlassian, the folks also behind BitBucket.
You can \href{https://confluence.atlassian.com/get-started-with-sourcetree/work-using-git-847359053.html}{use the app to} pull, commit, push, diff, etc.
If you're on Windows, perhaps try \href{https://www.gitkraken.com/}{GitKraken}.

\subsection{Clone the remote repo}

Expand Down
6 changes: 4 additions & 2 deletions logbook/entries/researchinfrastructure.tex
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
We organize the project as a series of tasks, so our organization of code and data takes a task-based perspective.
After writing code, we automate its execution via \texttt{make}.
We track our code (and the rest of the project) using \texttt{Git}, a version control system.
Collaboration occurs via Asana task assignments, GitHub/BitBucket code reviews, and logbook entries that share research designs and results.
Collaboration occurs via issue/task assignments, pull requests, and logbook entries that share research designs and results.

Use a good text editor like
\href{https://www.sublimetext.com/}{SublimeText},
Expand All @@ -22,10 +22,12 @@
\href{https://en.wikipedia.org/wiki/Syntax_highlighting}{syntax highlighting},
\href{https://en.wikipedia.org/wiki/Command-line_completion}{tab autocomplete},
and \href{https://www.sublimetext.com/}{multiple selection}.
We recommend VSCode, which supports Git, remote development via SSH, and a GitHub extension.

Our approach assumes that you'll use Unix/Linux/MacOSX.
Plain-text social science lives at the *nix command line.
\href{https://github.com/gslab-econ/ra-manual/wiki/Getting-Started}{Gentzkow and Shapiro}: ``The command line is our means of implementing tools.''
Per \href{https://jeroenjanssens.com/dsatcl/chapter-1-introduction.html\#why-data-science-at-the-command-line}{Janssens (2014)}: ``the command line is: agile, augmenting, scalable, extensible, and ubiquitous.''
Here are four intros to the Linux shell:
\begin{itemize}
\item \url{https://ryanstutorials.net/linuxtutorial/}
Expand All @@ -34,7 +36,7 @@
\item William E. Shotts, Jr's ``\href{http://linuxcommand.org/lc3_learning_the_shell.php}{Learning the Shell}''
\end{itemize}
Getting started at the command line can be a little overwhelming, but it's well worth it.
While you can use GUI apps to interact with most of our workflow (e.g., SourceTree for version control),
While you can use GUI apps to interact with most of our workflow (e.g., GitHub Desktop),
automation of some key parts relies on shell scripts.
See logbook entry~\ref{entry:unixshelltips} for a haphazard collection of shell tips.

Expand Down
2 changes: 1 addition & 1 deletion logbook/entries/statanotes.tex
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

\subsubsection{Command-line execution}

For old versions (before Stata 16), a few Stata commands only work in ``interactive'' mode and not when Stata is invokved at the command line.
For old versions (before Stata 16), a few Stata commands only work in ``interactive'' mode and not when Stata is invoked at the command line.
\begin{itemize}
\item \texttt{graph export}:
``ps and eps are available for all versions of Stata; png and tif are available for all versions of Stata except Stata(console) for Unix; pdf is available only for Stata for Windows and Stata for Mac.''
Expand Down
Loading

0 comments on commit 9bfa059

Please sign in to comment.