Skip to content

Commit

Permalink
Bug fix
Browse files Browse the repository at this point in the history
Signed-off-by: Stefano Savare <[email protected]>
  • Loading branch information
deatinor committed Jun 15, 2021
1 parent bd9551a commit 60c3db7
Show file tree
Hide file tree
Showing 4 changed files with 59 additions and 3 deletions.
2 changes: 0 additions & 2 deletions .github/workflows/documentation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,6 @@ jobs:
make doc-install
- name: Build sphinx documentation
run: |
ls ./docs
make documentation
- name: Configure AWS Credentials
uses: aws-actions/configure-aws-credentials@v1
Expand All @@ -31,7 +30,6 @@ jobs:
aws-region: us-east-1
- name: Deploy static site to S3 bucket
run: |
ls ./docs
aws s3 rm s3://${{ secrets.AWS_DOCUMENTATION_BUCKET }}/documentation --recursive
aws s3 sync ./docs/documentation/ s3://${{ secrets.AWS_DOCUMENTATION_BUCKET }}/deep-experiments/ --delete
# - name: gcloud auth
Expand Down
1 change: 1 addition & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ streamlit-deploy:
docker push 961104659532.dkr.ecr.us-east-1.amazonaws.com/streamlit

documentation:
rm -rf docs/documentation
sphinx-build -b html docs/source docs/documentation

documentation-push:
Expand Down
57 changes: 57 additions & 0 deletions docs/source/dfs-data/description.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
Data
====

The data for the project is in the folder `data`.
The `immap` subfolder contains only IMMAP data, while `frameworks_data` contains all the rest.

What is our data?
------------------------

Our data is composed of ~100.000 sentences extracted from PDF and web articles. Each sentence has been manually
labeled by taggers. Multiple labels are associated to the same sentence:

- Sector
- Pillar
- Subpillar

We want to predict all of them. Each subpillar belongs to one and only one pillar.
More labels are coming.

How good is the data
--------------------

Not so much, some classes are ambiguous and, since we have multiple taggers, their tags are not
always consistent.
However, we have a lot of data, which is good.


Which data should I use?
------------------------

We are currently working only with `frameworks_data` and the most recent data version.

We advise you to import the file:

.. code-block:: python
from deep.constants import *
The variable ``LATEST_DATA_PATH`` points to the most recent version of the data.

What are the differences between the data versions?
----------------------------------------------------

Alongside the dataset queried (IMMAP vs all) mainly bug fixes and better definition of the classes.
Please use the latest version.

How do I get the most recent version?
-------------------------------------

We use `DVC <https://dvc.org>`_ to deal with data. Simply run

.. code-block:: bash
dvc pull
to get the most recent version.

2 changes: 1 addition & 1 deletion docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
:maxdepth: 1
:caption: Data:

data/description
dfs-data/description

.. toctree::
:maxdepth: 1
Expand Down

0 comments on commit 60c3db7

Please sign in to comment.