Skip to content

Commit

Permalink
Initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
mle2718 authored Sep 13, 2024
0 parents commit 6b7633f
Show file tree
Hide file tree
Showing 36 changed files with 1,195 additions and 0 deletions.
5 changes: 5 additions & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# End of line characters are a problem when the team is developing on Windows (CRLF), Unix (LF), and Mac (LF)
# https://www.aleksandrhovhannisyan.com/blog/crlf-vs-lf-normalizing-line-endings-in-git/
# This small file will fix that. Line endings in the repository will be set to LF while local line endings will be left "alone".
# This enables windows users to develop using CRLF but then push using LF.
* text=auto
13 changes: 13 additions & 0 deletions .github/workflows/secretScan.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
name: gitleaks

on: [push,pull_request]

jobs:
gitleaks:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
with:
fetch-depth: '0'
- name: gitleaks-action
uses: gitleaks/[email protected]
18 changes: 18 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
.Rproj.user
*.Rproj
.Rhistory
*.RData
*.Rdata
*.Rds
.Ruserdata
*credentials*
*Credentials*
*.log
*.html
library.bib
*.pdf
*.doc
*.docx
*.aux
*.out
*.dta
5 changes: 5 additions & 0 deletions License.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
Software code created by U.S. Government employees is not subject to copyright in the United States (17 U.S.C. §105).
The United States/Department of Commerce reserve all rights to seek and obtain copyright protection in countries other
than the United States for Software authored in its entirety by the Department of Commerce. To this end, the Department
of Commerce hereby grants to Recipient a royalty-free, nonexclusive license to use, copy, and create derivative works of
the Software outside of the United States.
53 changes: 53 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# Project Template

A small repository that will help you set up an organized project. There is also a tiny bit of sample code about how to get data off the NEFSC oracle servers.

1. A folder structure and some utilities that will (hopefully) help you keep things organized
1. Getting Data
1. Sample code for extracting data from oracle using stata.
1. Sample code for extracting data from oracle using R with ROracle and RODBC.
1. Sample code for extracting data from the St. Louis Fed using stata.
1. If you need to extract data from oracle on one of the NEFSC servers, look [here](https://github.com/NEFSC/READ-SSB-LEE-On-the-servers)
1. a class file (ajae_mod.csl) and a latex preamble (preamble-latex.tex) that might make your life easier if you are using markdown to make pdfs.

# How to use

1. Create a [new repository](/images/new_repository.jpg) using this as a [template](/images/from_template.jpg).
2. Clone it the new repository locally, using Github Desktop, Rstudio, or something else.
3. Delete the parts of this readme that are no longer needed.
4. If R and Rstudio is part of your workflow, associate the directory with a project. File--> New Project--> Existing Directory.

# Overview and Folder structure

This is mostly borrowed from the world bank's EDB. https://dimewiki.worldbank.org/wiki/Stata_Coding_Practices
Please use forward slashes (that is C:/path/to/your/folder) instead of backslashes for unix/mac compatability. I'm forgetful about this.

I keep each project in a separate folder. A stata do file containing folder names get stored as a macro in stata's startup profile.do. This lets me start working on any of my projects by opening stata and typing:
```
do $my_project_name
```
Rstudio users using projects don't have to do this step. But it is convenient to read paths into variables by using the "R_paths_libraries.R" file.


# On passwords and other confidential information

Basically, you will want to store them in a place that does not get uploaded to github.

For stata users, there is a description [here](/documentation/project_logistics.md).

For R users, try storing it in [.Rprofile](/R_code/project_logistics/.Rprofile_sample). Store you API keys or Personal Access Tokens in [.Renviron](/R_code/project_logistics/.Renviron_sample)

# NOAA Requirements
This repository is a scientific product and is not official communication of the National Oceanic and Atmospheric Administration, or the United States Department of Commerce. All NOAA GitHub project code is provided on an ‘as is’ basis and the user assumes responsibility for its use. Any claims against the Department of Commerce or Department of Commerce bureaus stemming from the use of this GitHub project will be governed by all applicable Federal law. Any reference to specific commercial products, processes, or services by service mark, trademark, manufacturer, or otherwise, does not constitute or imply their endorsement, recommendation or favoring by the Department of Commerce. The Department of Commerce seal and logo, or the seal and logo of a DOC bureau, shall not be used in any manner to imply endorsement of any commercial product or activity by DOC or the United States Government.”


1. who worked on this project: Min-Yang Lee
1. when this project was created: Jan, 2021
1. what the project does: Helps people get organized. Shows how to get data from NEFSC oracle
1. why the project is useful: Helps people get organized. Shows how to get data from NEFSC oracle
1. how users can get started with the project: Download and follow the readme
1. where users can get help with your project: email me or open an issue
1. who maintains and contributes to the project. Min-Yang

# License file
See here for the [license file](License.txt)
17 changes: 17 additions & 0 deletions R_code/data_extraction_processing/extraction/FRED_extraction.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Get Economic data from FRED
library(fredr)


# Make sure you have an API key and have set it in your .Renviron or .Rprofile
# If you have done this properly, you the following command should print your API key.

Sys.getenv("FRED_API_KEY")

# Extract some data.
deflators <- fredr(
series_id = "GDPDEF",
observation_start = as.Date("2007-01-01"),
observation_end = as.Date("2022-06-01"),
realtime_start =NULL,
realtime_end =NULL,
frequency = "q")
77 changes: 77 additions & 0 deletions R_code/data_extraction_processing/extraction/dbi_extraction.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
library(dplyr)
library(DBI)
library(ROracle)
library(dbplyr)



# DBI and ODBC Connection
# This code assumes that
# An approriate DSN is stored in the R object "nefsc_users"
# your oracle id is stored in the R object "id"
# your oracle password is stored in the the object "novapw"
# star_dbi_odbc <- DBI::dbConnect(odbc::odbc(),
# nefsc_users,
# UID = id,
# PWD = novapw)




# DBI and ROracle Connection
# This code assumes that
# A connection string (nefscusers.connect.string) has been assembled
# your oracle id is stored in the R object "id"
# your oracle password is stored in the the object "novapw"


star_dbi_ROracle <- DBI::dbConnect(dbDriver("Oracle"),id, password=novapw, dbname=nefscusers.connect.string)



# SOME Sample queries, not run
# #This takes a little while, because there are about 1.6M rows
# cams_cfdets<-paste0("select Year, camsid, permit, itis_tsn, lndlb, nvl(value,0) as value, state, subtrip
# from CAMS_GARFO.CAMS_LAND
# where (Year>=2020 and Year <= 2021)and permit not in ('000000', '190998','390998')
# and DLR_UTILCD in (0,7)")
#
# # This just pulls 1 month and is much faster (~113k rows)
# cams_cfdets<-paste0("select Year, camsid, permit, itis_tsn, lndlb, nvl(value,0) as value, state, subtrip
# from CAMS_GARFO.CAMS_LAND
# where Year=2021 and MONTH= 8 and permit not in ('000000', '190998','390998')
# and DLR_UTILCD in (0,7)")
#
#
permit_query<-paste0("select * from NEFSC_GARFO.PERMIT_VPS_VESSEL WHERE
AP_YEAR>=2018
order by vp_num, ap_year")


# Get data using DBI and ODBC
# permit_data<-dplyr::tbl(star_dbi_odbc,sql(permit_query)) %>%
# collect()
#
# dbDisconnect(star_dbi_odbc)


# Get data using DBI and ROracle
permit_data2<-dplyr::tbl(star_dbi_ROracle,sql(permit_query)) %>%
collect()


#cams_spec<-dplyr::tbl(star_dbi,sql(cams_cfdets)) %>%
# collect()


# Some People like to get the entire table using DBPLYR's in_schema and then do tidy operations on it.
VPS_VESSEL <- tbl(star_dbi_ROracle, in_schema("NEFSC_GARFO", "PERMIT_VPS_VESSEL"))


VPS_VESSEL <- VPS_VESSEL %>%
filter(AP_YEAR>=2018) %>%
collect()


dbDisconnect(star_dbi_ROracle)

89 changes: 89 additions & 0 deletions R_code/data_extraction_processing/extraction/r_oracle_connection.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
# This is code that uses Roracle to connect to oracle databases.
# ROracle can be tricky to set up. See this document for instructions https://docs.google.com/document/d/1Qsv_Jfc8CsoG49-qK-2RHdSJzR7v48W7ehZbG9k3RUQ/edit

# your oracle id is stored in the R object "id"
# your oracle password is stored in the the object "oracle_pw"


if(!require(ROracle)) {
install.packages("ROracle")
require(ROracle)}


#### Set things up
here::i_am("R_code/data_extraction_processing/extraction/r_oracle_connection.R")

my_projdir<-here()

#this reads in paths and libraries
source(file.path(my_projdir,"R_code","project_logistics","R_paths_libraries.R"))




# DBI and ROracle Connection
# This code assumes that
# your oracle id is stored in the R object "id"
# your oracle password is stored in the the object "novapw"


############################################################################################
#First, set up Oracle Connection
############################################################################################

# The following are details needed to connect using ROracle.
drv<-dbDriver("Oracle")
shost <- "<nefsc_users.full.path.to.server.gov>"
port <- port_number_here
ssid <- "<ssid_here>"

nefscusers.connect.string<-paste(
"(DESCRIPTION=",
"(ADDRESS=(PROTOCOL=tcp)(HOST=", shost, ")(PORT=", port, "))",
"(CONNECT_DATA=(SERVICE_NAME=", ssid, ")))", sep="")





START.YEAR= 2015
END.YEAR=2018

#First, pull in permits and tripids into a list.
permit_tripids<-list()
i<-1


for (years in START.YEAR:END.YEAR){
users_conn<-ROracle::dbConnect(drv, id, password=novapw, dbname=nefscusers.connect.string)
querystring<-paste0("select permit, tripid from vtr.veslog",years,"t")
permit_tripids[[i]]<-dbGetQuery(users_conn, querystring)
dbDisconnect(users_conn)
i<-i+1
}
#flatten the list into a dataframe

permit_tripids<-do.call(rbind.data.frame, permit_tripids)
colnames(permit_tripids)[which(names(permit_tripids) == "PERMIT")] <- "permit"



# Pull in gearcode data frame from sole
users_conn<-ROracle::dbConnect(drv, id, password=novapw, dbname=nefscusers.connect.string)

querystring2<-paste0("select gearcode, negear, negear2, gearnm from vtr.vlgear")
VTRgear<-dbGetQuery(users_conn, querystring2)

dbDisconnect(users_conn)












17 changes: 17 additions & 0 deletions R_code/project_logistics/.Renviron_sample
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# .Renviron is used to set Environment Variables.
# Courtesy of Beth Josephson and Dave Hiltz

# To use this, delete _sample and put this in a place that R searches for a .Renviron file.
# One good place is the result of Sys.getenv("HOME")
# Don't delete the leading period
# This could be helpful if you are using a servers and need to use oracle
# This could be helpful if you are trying to use git


# ORACLE_HOME=/ora1/app/oracle/product/11.2.0/dbhome_1
# ORACLE_HOME=/usr/lib/oracle/21/client64
# GITHUB_PAT=your_pat_goeshere
# FRED_API_KEY=your_API_KEY_goes_here

# R_LIBS_USER = "//net/path/to/your/home/R/x86_64_pc-linux-gnu-library"

33 changes: 33 additions & 0 deletions R_code/project_logistics/.Rprofile_sample
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# .Rprofile -- Execute some R commands at the beginning of each R session
# You can use this file to load packages, set options, etc.
# It's a particularly good place to store usernames, passwords, or the location of an oracle server.
# Just make sure you never commit the file to a git repository.


# To use this, delete _sample and put this in a place that R searches for a .Rprofile file.
# One good place to put this file is the result of Sys.getenv("HOME")
# Don't delete the leading period

# NOTE: changes in this file won't be reflected until after you quit
# and start a new session
#

# Load the NEFSC network into a variable
network_location_desktop = "<the_ipaddress_to_the_network>"
# This might work better
network_location_desktop = "blah.blah..noaa.gov"
network_location_remote = "//net"

# You might even want to store your database usernames and passwords here, so they are always loaded into the R environment when you start up R.

id<-"your_user_id_here"
password<-"your_secret_password_here"
database_location<-"path.to.database.gov"

# You can set system environment variables, like a GITHUB or GITLAB Personal Access Token or an API key.
# Sys.setenv(GITHUB_PAT = "YOUR PAT HERE")
# Sys.setenv(GITLAB_PAT="YOUR PAT HERE")
# Sys.setenv(FRED_API_KEY="YOUR API KEY HERE")

# and it might be useful to make R aware of someone elses network share.
# someone_elses_dir<-file.path("/home","your_user_id","path_to_other_dir")
1 change: 1 addition & 0 deletions R_code/project_logistics/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
R_credentials.R
Loading

0 comments on commit 6b7633f

Please sign in to comment.