Skip to content

Commit

Permalink
CRSP introduction and usage example
Browse files Browse the repository at this point in the history
  • Loading branch information
jmbejara committed Aug 10, 2024
1 parent 45f966e commit 4e62f01
Show file tree
Hide file tree
Showing 7 changed files with 69 additions and 3 deletions.
Binary file added docs_src/assets/CRSP_useful_variables.pptx
Binary file not shown.
5 changes: 4 additions & 1 deletion docs_src/discussion_01.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,10 @@ discuss the various ways of interacting with Python: Google Collab, Jupyter
Notebooks through the standard Jupyter server, Jupyter Notebooks in VS Code,
using IPython in the command line, and running Python scripts directly from the
command line (`.py` files).
- **Individual Help with Setup.** Save 30-45 minutes at the end to help students individually with their setup.
- **Individual Help with Setup.** Save 30-45 minutes at the end to help students individually with their setup. It's probably helpful to save even up to an hour
during this first lecture to help students with their setups. I hope to save some time at the end of each lecture to help students with their setups.
This is an important purpose of this August review. It is most useful if it is interactive and if students can get one-on-one help before
the school year starts.

## Homework

Expand Down
1 change: 1 addition & 0 deletions docs_src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,7 @@ HW3.md
:maxdepth: 1
:caption: Discussion 4️
discussion_04.md
using_CRSP_data.md
_notebook_build/_04_wrds_python_package.ipynb
_notebook_build/_04_CRSP_market_index.ipynb
```
Expand Down
26 changes: 26 additions & 0 deletions docs_src/using_CRSP_data.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# 4.1 Using CRSP Data

In this discussion, we will learn how to use the CRSP dataset. The CRSP dataset is a comprehensive dataset that contains information on stock prices, returns, and other financial information. This dataset is widely used in academic research and is a valuable resource for anyone interested in studying financial markets.

There are several pitfalls that one might encounter when working with the CRSP dataset. In this discussion, we will discuss some of these pitfalls and how to avoid them.

## Key Concepts

- Always be sure to read the manual first! Before working with a dataset, it is important to read the manual to understand the structure of the data and how it is organized. There are often some pitfalls that you could miss if you don't first read the manual. Here, we'll demonstrate some of these that show up in the CRSP data.
- You can find manuals for data sets in WRDS in the documentation section.
- You can also find video tutorials associated with many of the key datasets in WRDS. See here: https://wrds-www.wharton.upenn.edu/pages/video-support/

- Go over the "Useful Variables" in CRSP described here: [CRSP Useful Variables](./assets/CRSP_useful_variables.pptx)
- What do the negative prices in CRSP mean?
- How does CRSP handle stock splits?

- Merging CRSP and Computstat: As a note, there is a matrix of linking suggestions provided by WRDS that gives recommendations about how to merge various datasets. See here: https://wrds-www.wharton.upenn.edu/pages/wrds-research/database-linking-matrix/ For CRSP and Compustat, there is a separate table that provides the links between the two: https://wrds-www.wharton.upenn.edu/pages/wrds-research/database-linking-matrix/linking-crsp-with-compustat/

## Try it out yourself!

- Download a sample of the CRSP dataset using the WRDS query form. Then, open a `.py` file and interactively explore it. Call the file, `./src/CRSP_exploration.py`. I have provided an example in this repo.
- When using the web query form, make sure you learn the following:
- Save a query so you can reuse it later.
- How can I explore some backend information about the query?
- Learn how to use SAS Studio to explore the data interactively.
- Make sure you know where you can access the documentation for the dataset.
2 changes: 1 addition & 1 deletion env.example
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
# The default paths are these, specified as relative paths.
DATA_DIR=./data
OUTPUT_DIR=./output
START_DATE=1913-01-01
START_DATE=2020-01-01
END_DATE=2023-10-01
WRDS_USERNAME=jdoe
PYDEVD_DISABLE_FILE_VALIDATION=1
Expand Down
2 changes: 1 addition & 1 deletion env.example_alt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
#
DATA_DIR=D:/Dropbox/project_data/blank_project
OUTPUT_DIR=C:/Users/jdoe/GitRepositories/blank_project/output
START_DATE=1913-01-01
START_DATE=2020-01-01
END_DATE=2023-10-01
WRDS_USERNAME=jdoe
PYDEVD_DISABLE_FILE_VALIDATION=1
Expand Down
36 changes: 36 additions & 0 deletions src/CRSP_exploration.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
import numpy as np
import pandas as pd

import config
DATA_DIR = config.DATA_DIR

filepath = DATA_DIR / "CRSP_example.csv"
df = pd.read_csv(filepath, parse_dates=["date"])
df.info()

df.head()


(
df
.loc[df["PERMNO"] == 10026, ["date", "PRC"]]
.set_index("date")
.plot()
)


(
df
.loc[df["PERMNO"] == 10026, ["date", "vwretd"]]
.set_index("date")
.plot()
)


(
df
.loc[df["PERMNO"] == 10026, ["date", "vwretd"]]
.set_index("date")
.cumsum()
.plot()
)

0 comments on commit 4e62f01

Please sign in to comment.