-
Notifications
You must be signed in to change notification settings - Fork 48
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Adjust README files; remove members files.
- Loading branch information
Showing
9 changed files
with
0 additions
and
343 deletions.
There are no files selected for viewing
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -22,108 +22,3 @@ We believe that these datasets will allow us to answer the following question: " | |
|
||
# Pulling CSVs | ||
Two of our datasets are in the form of CSVs, specifically our census and roads data. We uploaded these to datamechanics.io and pulled them in their respective get modules. These are pulled as strings, we parse them line by line to turn into dictionaries and insert into Mongo. | ||
|
||
# course-2017-fal-proj | ||
Joint repository for the collection of student course projects in the Fall 2017 iteration of the Data Mechanics course at Boston University. | ||
|
||
In this project, you will implement platform components that can obtain a some data sets from web services of your choice, and platform components that combine these data sets into at least two additional derived data sets. These components will interct with the backend repository by inserting and retrieving data sets as necessary. They will also satisfy a standard interface by supporting specified capabilities (such as generation of dependency information and provenance records). | ||
|
||
**This project description will be updated as we continue work on the infrastructure.** | ||
|
||
## MongoDB infrastructure | ||
|
||
### Setting up | ||
|
||
We have committed setup scripts for a MongoDB database that will set up the database and collection management functions that ensure users sharing the project data repository can read everyone's collections but can only write to their own collections. Once you have installed your MongoDB instance, you can prepare it by first starting `mongod` _without authentication_: | ||
``` | ||
mongod --dbpath "<your_db_path>" | ||
``` | ||
If you're setting up after previously running `setup.js`, you may want to reset (i.e., delete) the repository as follows. | ||
``` | ||
mongo reset.js | ||
``` | ||
Next, make sure your user directories (e.g., `alice_bob` if Alice and Bob are working together on a team) are present in the same location as the `setup.js` script, open a separate terminal window, and run the script: | ||
``` | ||
mongo setup.js | ||
``` | ||
Your MongoDB instance should now be ready. Stop `mongod` and restart it, enabling authentication with the `--auth` option: | ||
``` | ||
mongod --auth --dbpath "<your_db_path>" | ||
``` | ||
|
||
### Working on data sets with authentication | ||
|
||
With authentication enabled, you can start `mongo` on the repository (called `repo` by default) with your user credentials: | ||
``` | ||
mongo repo -u alice_bob -p alice_bob --authenticationDatabase "repo" | ||
``` | ||
However, you should be unable to create new collections using `db.createCollection()` in the default `repo` database created for this project: | ||
``` | ||
> db.createCollection("EXAMPLE"); | ||
{ | ||
"ok" : 0, | ||
"errmsg" : "not authorized on repo to execute command { create: \"EXAMPLE\" }", | ||
"code" : 13 | ||
} | ||
``` | ||
Instead, load the server-side functions so that you can use the customized `createCollection()` function, which creates a collection that can be read by everyone but written only by you: | ||
``` | ||
> db.loadServerScripts(); | ||
> var EXAMPLE = createCollection("EXAMPLE"); | ||
``` | ||
Notice that this function also prefixes the user name to the name of the collection (unless the prefix is already present in the name supplied to the function). | ||
``` | ||
> EXAMPLE | ||
alice_bob.EXAMPLE | ||
> db.alice_bob.EXAMPLE.insert({value:123}) | ||
WriteResult({ "nInserted" : 1 }) | ||
> db.alice_bob.EXAMPLE.find() | ||
{ "_id" : ObjectId("56b7adef3503ebd45080bd87"), "value" : 123 } | ||
``` | ||
If you do not want to run `db.loadServerScripts()` every time you open a new terminal, you can use a `.mongorc.js` file in your home directory to store any commands or calls you want issued whenever you run `mongo`. | ||
|
||
## Other required libraries and tools | ||
|
||
You will need the latest versions of the PROV, DML, and Protoql Python libraries. If you have `pip` installed, the following should install the latest versions automatically: | ||
``` | ||
pip install prov --upgrade --no-cache-dir | ||
pip install dml --upgrade --no-cache-dir | ||
pip install protoql --upgrade --no-cache-dir | ||
``` | ||
If you are having trouble installing `lxml` in a Windows environment, you could try retrieving it [here](http://www.lfd.uci.edu/~gohlke/pythonlibs/). | ||
|
||
Note that you may need to use `python -m pip install <library>` to avoid issues if you have multiple versions of `pip` and Python on your system. | ||
|
||
## Formatting the `auth.json` file | ||
|
||
The `auth.json` file should remain empty and should not be submitted. When you are running your algorithms, you should use the file to store your credentials for any third-party data resources, APIs, services, or repositories that you use. An example of the contents you might store in your `auth.json` file is as follows: | ||
``` | ||
{ | ||
"services": { | ||
"cityofbostondataportal": { | ||
"service": "https://data.cityofboston.gov/", | ||
"username": "[email protected]", | ||
"token": "XxXXXXxXxXxXxxXXXXxxXxXxX", | ||
"key": "xxXxXXXXXXxxXXXxXXXXXXxxXxxxxXXxXxxX" | ||
}, | ||
"mbtadeveloperportal": { | ||
"service": "http://realtime.mbta.com/", | ||
"username": "alice_bob", | ||
"token": "XxXX-XXxxXXxXxXXxXxX_x", | ||
"key": "XxXX-XXxxXXxXxXXxXxx_x" | ||
} | ||
} | ||
} | ||
``` | ||
To access the contents of the `auth.json` file after you have loaded the `dml` library, use `dml.auth`. | ||
|
||
## Running the execution script for a contributed project. | ||
|
||
To execute all the algorithms for a particular contributor (e.g., `alice_bob`) in an order that respects their explicitly specified data flow dependencies, you can run the following from the root directory: | ||
``` | ||
python execute.py alice_bob | ||
``` | ||
To execute the algorithms for a particular contributor in trial mode, use the `-t` or `--trial` option: | ||
``` | ||
python execute.py alice_bob --trial | ||
``` |
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,3 @@ | ||
# course-2017-fal-proj | ||
|
||
# Narrative | ||
|
||
For this project, we have decided to answer a certain question that is essential to solve in city enviornments. What factors contribute to criminal activities? To answer this question, we have taken certain data that we believe may have an impact on the crimes that occur in the Boston Area. We transformed datasets involving crime reports, MBTA schedules, property values, employee earnings, and education info in order to further analyze the impact each of these factors may have had on criminal activity in the Boston Area. | ||
|
@@ -28,108 +26,3 @@ crimesSorted.py - creates a collection that groups all crimes that occur based o | |
crimesProperty.py - creates a collection that takes a crime and the street name the crime occured on, and the property value data that corresponds to that street. | ||
|
||
mbta_ln.py - creates a collection that groups late night routes based on weekday. | ||
|
||
|
||
Joint repository for the collection of student course projects in the Fall 2017 iteration of the Data Mechanics course at Boston University. | ||
|
||
In this project, you will implement platform components that can obtain a some data sets from web services of your choice, and platform components that combine these data sets into at least two additional derived data sets. These components will interct with the backend repository by inserting and retrieving data sets as necessary. They will also satisfy a standard interface by supporting specified capabilities (such as generation of dependency information and provenance records). | ||
|
||
**This project description will be updated as we continue work on the infrastructure.** | ||
|
||
## MongoDB infrastructure | ||
|
||
### Setting up | ||
|
||
We have committed setup scripts for a MongoDB database that will set up the database and collection management functions that ensure users sharing the project data repository can read everyone's collections but can only write to their own collections. Once you have installed your MongoDB instance, you can prepare it by first starting `mongod` _without authentication_: | ||
``` | ||
mongod --dbpath "<your_db_path>" | ||
``` | ||
If you're setting up after previously running `setup.js`, you may want to reset (i.e., delete) the repository as follows. | ||
``` | ||
mongo reset.js | ||
``` | ||
Next, make sure your user directories (e.g., `alice_bob` if Alice and Bob are working together on a team) are present in the same location as the `setup.js` script, open a separate terminal window, and run the script: | ||
``` | ||
mongo setup.js | ||
``` | ||
Your MongoDB instance should now be ready. Stop `mongod` and restart it, enabling authentication with the `--auth` option: | ||
``` | ||
mongod --auth --dbpath "<your_db_path>" | ||
``` | ||
|
||
### Working on data sets with authentication | ||
|
||
With authentication enabled, you can start `mongo` on the repository (called `repo` by default) with your user credentials: | ||
``` | ||
mongo repo -u alice_bob -p alice_bob --authenticationDatabase "repo" | ||
``` | ||
However, you should be unable to create new collections using `db.createCollection()` in the default `repo` database created for this project: | ||
``` | ||
> db.createCollection("EXAMPLE"); | ||
{ | ||
"ok" : 0, | ||
"errmsg" : "not authorized on repo to execute command { create: \"EXAMPLE\" }", | ||
"code" : 13 | ||
} | ||
``` | ||
Instead, load the server-side functions so that you can use the customized `createCollection()` function, which creates a collection that can be read by everyone but written only by you: | ||
``` | ||
> db.loadServerScripts(); | ||
> var EXAMPLE = createCollection("EXAMPLE"); | ||
``` | ||
Notice that this function also prefixes the user name to the name of the collection (unless the prefix is already present in the name supplied to the function). | ||
``` | ||
> EXAMPLE | ||
alice_bob.EXAMPLE | ||
> db.alice_bob.EXAMPLE.insert({value:123}) | ||
WriteResult({ "nInserted" : 1 }) | ||
> db.alice_bob.EXAMPLE.find() | ||
{ "_id" : ObjectId("56b7adef3503ebd45080bd87"), "value" : 123 } | ||
``` | ||
If you do not want to run `db.loadServerScripts()` every time you open a new terminal, you can use a `.mongorc.js` file in your home directory to store any commands or calls you want issued whenever you run `mongo`. | ||
|
||
## Other required libraries and tools | ||
|
||
You will need the latest versions of the PROV, DML, and Protoql Python libraries. If you have `pip` installed, the following should install the latest versions automatically: | ||
``` | ||
pip install prov --upgrade --no-cache-dir | ||
pip install dml --upgrade --no-cache-dir | ||
pip install protoql --upgrade --no-cache-dir | ||
``` | ||
If you are having trouble installing `lxml` in a Windows environment, you could try retrieving it [here](http://www.lfd.uci.edu/~gohlke/pythonlibs/). | ||
|
||
Note that you may need to use `python -m pip install <library>` to avoid issues if you have multiple versions of `pip` and Python on your system. | ||
|
||
## Formatting the `auth.json` file | ||
|
||
The `auth.json` file should remain empty and should not be submitted. When you are running your algorithms, you should use the file to store your credentials for any third-party data resources, APIs, services, or repositories that you use. An example of the contents you might store in your `auth.json` file is as follows: | ||
``` | ||
{ | ||
"services": { | ||
"cityofbostondataportal": { | ||
"service": "https://data.cityofboston.gov/", | ||
"username": "[email protected]", | ||
"token": "XxXXXXxXxXxXxxXXXXxxXxXxX", | ||
"key": "xxXxXXXXXXxxXXXxXXXXXXxxXxxxxXXxXxxX" | ||
}, | ||
"mbtadeveloperportal": { | ||
"service": "http://realtime.mbta.com/", | ||
"username": "alice_bob", | ||
"token": "XxXX-XXxxXXxXxXXxXxX_x", | ||
"key": "XxXX-XXxxXXxXxXXxXxx_x" | ||
} | ||
} | ||
} | ||
``` | ||
To access the contents of the `auth.json` file after you have loaded the `dml` library, use `dml.auth`. | ||
|
||
## Running the execution script for a contributed project. | ||
|
||
To execute all the algorithms for a particular contributor (e.g., `alice_bob`) in an order that respects their explicitly specified data flow dependencies, you can run the following from the root directory: | ||
``` | ||
python execute.py alice_bob | ||
``` | ||
To execute the algorithms for a particular contributor in trial mode, use the `-t` or `--trial` option: | ||
``` | ||
python execute.py alice_bob --trial | ||
``` |
Oops, something went wrong.