diff --git a/docs_src/assets/CRSP_useful_variables.pptx b/docs_src/assets/CRSP_useful_variables.pptx new file mode 100644 index 0000000..cc6bc57 Binary files /dev/null and b/docs_src/assets/CRSP_useful_variables.pptx differ diff --git a/docs_src/discussion_01.md b/docs_src/discussion_01.md index a24aa90..9e2c074 100644 --- a/docs_src/discussion_01.md +++ b/docs_src/discussion_01.md @@ -14,7 +14,10 @@ discuss the various ways of interacting with Python: Google Collab, Jupyter Notebooks through the standard Jupyter server, Jupyter Notebooks in VS Code, using IPython in the command line, and running Python scripts directly from the command line (`.py` files). -- **Individual Help with Setup.** Save 30-45 minutes at the end to help students individually with their setup. +- **Individual Help with Setup.** Save 30-45 minutes at the end to help students individually with their setup. It's probably helpful to save even up to an hour +during this first lecture to help students with their setups. I hope to save some time at the end of each lecture to help students with their setups. +This is an important purpose of this August review. It is most useful if it is interactive and if students can get one-on-one help before +the school year starts. ## Homework diff --git a/docs_src/index.md b/docs_src/index.md index cbb71e8..a9f18a2 100644 --- a/docs_src/index.md +++ b/docs_src/index.md @@ -101,6 +101,7 @@ HW3.md :maxdepth: 1 :caption: Discussion 4️ discussion_04.md +using_CRSP_data.md _notebook_build/_04_wrds_python_package.ipynb _notebook_build/_04_CRSP_market_index.ipynb ``` diff --git a/docs_src/using_CRSP_data.md b/docs_src/using_CRSP_data.md new file mode 100644 index 0000000..7acf814 --- /dev/null +++ b/docs_src/using_CRSP_data.md @@ -0,0 +1,26 @@ +# 4.1 Using CRSP Data + +In this discussion, we will learn how to use the CRSP dataset. The CRSP dataset is a comprehensive dataset that contains information on stock prices, returns, and other financial information. This dataset is widely used in academic research and is a valuable resource for anyone interested in studying financial markets. + +There are several pitfalls that one might encounter when working with the CRSP dataset. In this discussion, we will discuss some of these pitfalls and how to avoid them. + +## Key Concepts + +- Always be sure to read the manual first! Before working with a dataset, it is important to read the manual to understand the structure of the data and how it is organized. There are often some pitfalls that you could miss if you don't first read the manual. Here, we'll demonstrate some of these that show up in the CRSP data. + - You can find manuals for data sets in WRDS in the documentation section. + - You can also find video tutorials associated with many of the key datasets in WRDS. See here: https://wrds-www.wharton.upenn.edu/pages/video-support/ + +- Go over the "Useful Variables" in CRSP described here: [CRSP Useful Variables](./assets/CRSP_useful_variables.pptx) + - What do the negative prices in CRSP mean? + - How does CRSP handle stock splits? + +- Merging CRSP and Computstat: As a note, there is a matrix of linking suggestions provided by WRDS that gives recommendations about how to merge various datasets. See here: https://wrds-www.wharton.upenn.edu/pages/wrds-research/database-linking-matrix/ For CRSP and Compustat, there is a separate table that provides the links between the two: https://wrds-www.wharton.upenn.edu/pages/wrds-research/database-linking-matrix/linking-crsp-with-compustat/ + +## Try it out yourself! + +- Download a sample of the CRSP dataset using the WRDS query form. Then, open a `.py` file and interactively explore it. Call the file, `./src/CRSP_exploration.py`. I have provided an example in this repo. +- When using the web query form, make sure you learn the following: + - Save a query so you can reuse it later. + - How can I explore some backend information about the query? + - Learn how to use SAS Studio to explore the data interactively. + - Make sure you know where you can access the documentation for the dataset. \ No newline at end of file diff --git a/env.example b/env.example index 4201e5c..eef79e7 100644 --- a/env.example +++ b/env.example @@ -6,7 +6,7 @@ # The default paths are these, specified as relative paths. DATA_DIR=./data OUTPUT_DIR=./output -START_DATE=1913-01-01 +START_DATE=2020-01-01 END_DATE=2023-10-01 WRDS_USERNAME=jdoe PYDEVD_DISABLE_FILE_VALIDATION=1 diff --git a/env.example_alt b/env.example_alt index 21ade3e..5af043d 100644 --- a/env.example_alt +++ b/env.example_alt @@ -5,7 +5,7 @@ # DATA_DIR=D:/Dropbox/project_data/blank_project OUTPUT_DIR=C:/Users/jdoe/GitRepositories/blank_project/output -START_DATE=1913-01-01 +START_DATE=2020-01-01 END_DATE=2023-10-01 WRDS_USERNAME=jdoe PYDEVD_DISABLE_FILE_VALIDATION=1 diff --git a/src/CRSP_exploration.py b/src/CRSP_exploration.py new file mode 100644 index 0000000..5e690e1 --- /dev/null +++ b/src/CRSP_exploration.py @@ -0,0 +1,36 @@ +import numpy as np +import pandas as pd + +import config +DATA_DIR = config.DATA_DIR + +filepath = DATA_DIR / "CRSP_example.csv" +df = pd.read_csv(filepath, parse_dates=["date"]) +df.info() + +df.head() + + +( + df + .loc[df["PERMNO"] == 10026, ["date", "PRC"]] + .set_index("date") + .plot() +) + + +( + df + .loc[df["PERMNO"] == 10026, ["date", "vwretd"]] + .set_index("date") + .plot() +) + + +( + df + .loc[df["PERMNO"] == 10026, ["date", "vwretd"]] + .set_index("date") + .cumsum() + .plot() +)