Skip to content

Commit

Permalink
rnd fixes
Browse files Browse the repository at this point in the history
  • Loading branch information
simonhuwiler committed Jan 9, 2019
1 parent 73e009f commit df0c7e7
Show file tree
Hide file tree
Showing 11 changed files with 27 additions and 180 deletions.
3 changes: 0 additions & 3 deletions data/1. pdfexport/0. info.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,6 @@
* Janssen - Kein pdf, website: https://www.janssen.com/switzerland/de/disclosure

## ToDo
* [Böhringer - List](http://localhost:8888/notebooks/data/1.%20pdfexport/files/Boehringer%20Ingelheim/0.%20Lists.ipynb) Umbrüche in Namen. Wie umgehen?
* [Böhringer - Accumulations](http://localhost:8888/notebooks/data/1.%20pdfexport/files/Boehringer%20Ingelheim/1.%20Accumulations.ipynb) - R&D hat zwei Werte. Bedeutung?
* [Allergan - Accumulations](http://localhost:8888/notebooks/data/1.%20pdfexport/files/Allergan/1.%20Accumulations.ipynb) - RnD nicht vorhanden
* [Mundipharma - List](http://localhost:8888/notebooks/data/1.%20pdfexport/files/Mundipharma/0.%20Lists.ipynb) - Keine Adresse!

## Info
Expand Down
2 changes: 1 addition & 1 deletion data/1. pdfexport/export/accumulations/amgen.csv
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@ hcp_amount,,,16450.76,46566.18,77991.73,42371.04,183379.71,amgen
hco_amount,,2130.0,574.66,6142.05,96604.77,3958.25,109409.73,amgen
hcp_count,,,37.0,41.0,26.0,13.0,62.0,amgen
hco_count,0.0,1.0,1.0,1.0,1.0,1.0,1.0,amgen
rnd,,,,,,,,amgen
rnd,,,,,,,1226796.97,amgen
2 changes: 1 addition & 1 deletion data/1. pdfexport/export/accumulations/boehringer.csv
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@ hcp_amount,,,,79006.65,138437.24,10447.12,227891.01,boehringer
hco_amount,1135965.56,654838.18,,,11761.53,673.5,1803238.77,boehringer
hcp_count,,,,53.0,120.0,11.0,157.0,boehringer
hco_count,4.0,17.0,,,3.0,1.0,22.0,boehringer
rnd,,,,,,,6505476.67,boehringer
rnd,,,,,,,1847280.27,boehringer
2 changes: 1 addition & 1 deletion data/1. pdfexport/export/accumulations/jansen.csv
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@ hcp_amount,,,8549.0,25200.0,14768.0,4026.0,52543.0,jansen
hcp_count,,0.0,17.0,13.0,7.0,7.0,44.0,jansen
hco_amount,477262.0,162446.0,0.0,0.0,75397.0,,715105.0,jansen
hco_count,3.0,12.0,0.0,0.0,2.0,,17.0,jansen
rnd,,,,,,,0.0,jansen
rnd,,,,,,,1291864.0,jansen
2 changes: 1 addition & 1 deletion data/1. pdfexport/export/accumulations/msd.csv
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@ hcp_amount,,,12336.6,88376.33,192201.35,12167.87,324943.34,msd
hco_amount,64209.59,720819.03,36524.15,36951.63,63627.6,1800.0,1167506.72,msd
hcp_count,,,32.0,53.0,64.0,15.0,,msd
hco_count,2.0,58.0,7.0,9.0,18.0,1.0,,msd
rnd,,,,,,,9109602.41,msd
rnd,,,,,,,3436188.36,msd
9 changes: 6 additions & 3 deletions data/1. pdfexport/files/Amgen/1. Accumulations.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -29,7 +29,7 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -39,7 +39,7 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 14,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -83,6 +83,9 @@
"df_export.iloc[2,0] = \"hcp_count\"\n",
"df_export.iloc[3,0] = \"hco_count\"\n",
"df_export.iloc[4,0] = \"rnd\"\n",
"\n",
"#Fix RnD\n",
"df_export.loc[df_export.type == 'rnd', \"total\"] = df_export['donations_grants']\n",
"df_export.loc[df_export.type == 'rnd', \"donations_grants\"] = np.NaN\n",
"\n",
"export_acumulations(df_export, 'amgen')\n"
Expand Down
169 changes: 6 additions & 163 deletions data/1. pdfexport/files/Boehringer Ingelheim/1. Accumulations.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -4,16 +4,15 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Böhringer - Accumulations - TODOS!\n",
"R&D hat zwei Werte. Bedeutung?\n",
"# Böhringer - Accumulations\n",
"\n",
"## Info\n",
"* Total stimmt bei Anzahl Empfänger manchmal nicht - falsch angeliefert"
"* Achtung: Spalte \"Total\" in RnD ist Total über das gesammte PDF!"
]
},
{
"cell_type": "code",
"execution_count": 36,
"execution_count": 15,
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -33,7 +32,7 @@
},
{
"cell_type": "code",
"execution_count": 37,
"execution_count": 16,
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -43,7 +42,7 @@
},
{
"cell_type": "code",
"execution_count": 50,
"execution_count": 17,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -93,169 +92,13 @@
"df_export.iloc[3,0] = \"hco_count\"\n",
"df_export.iloc[4,0] = \"rnd\"\n",
"\n",
"df_export.loc[df_export.type == 'rnd', \"total\"] = df_export.loc[df_export.type == 'rnd', \"sponsorship\"]\n",
"df_export.loc[df_export.type == 'rnd', \"total\"] = df_export[\"donations_grants\"]\n",
"df_export.loc[df_export.type == 'rnd', \"donations_grants\"] = np.NaN\n",
"df_export.loc[df_export.type == 'rnd', \"sponsorship\"] = np.NaN\n",
"\n",
"export_acumulations(df_export, 'boehringer')\n"
]
},
{
"cell_type": "code",
"execution_count": 89,
"metadata": {},
"outputs": [
{
"ename": "ValueError",
"evalue": "The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m<ipython-input-89-845960bedb57>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0mt\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mdf_export\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mdf_export\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtype\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;34m\"rnd\"\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0;32mif\u001b[0m \u001b[0mt\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'registration_fees'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0misna\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 3\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"x)\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0;31m#if pd.isna(t[\"registration_fees\"]):\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0;31m# print(\"nan\")\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/generic.py\u001b[0m in \u001b[0;36m__nonzero__\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 1574\u001b[0m raise ValueError(\"The truth value of a {0} is ambiguous. \"\n\u001b[1;32m 1575\u001b[0m \u001b[0;34m\"Use a.empty, a.bool(), a.item(), a.any() or a.all().\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1576\u001b[0;31m .format(self.__class__.__name__))\n\u001b[0m\u001b[1;32m 1577\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1578\u001b[0m \u001b[0m__bool__\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0m__nonzero__\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;31mValueError\u001b[0m: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()."
]
}
],
"source": [
"t = df_export[df_export.type == \"rnd\"]\n",
"if t['registration_fees'].isna():\n",
" print(\"x)\")\n",
"#if pd.isna(t[\"registration_fees\"]):\n",
"# print(\"nan\")\n",
"\n",
"#if pd.isna(df_export.loc[df_export.type == 'rnd', 'total']):\n",
"# print(4)"
]
},
{
"cell_type": "code",
"execution_count": 67,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>type</th>\n",
" <th>donations_grants</th>\n",
" <th>sponsorship</th>\n",
" <th>registration_fees</th>\n",
" <th>travel_accommodation</th>\n",
" <th>fees</th>\n",
" <th>related_expenses</th>\n",
" <th>total</th>\n",
" <th>source</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>hcp_amount</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>79006.65</td>\n",
" <td>138437.24</td>\n",
" <td>10447.12</td>\n",
" <td>227891.01</td>\n",
" <td>boehringer</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>hco_amount</td>\n",
" <td>1135965.56</td>\n",
" <td>654838.18</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>11761.53</td>\n",
" <td>673.50</td>\n",
" <td>1803238.77</td>\n",
" <td>boehringer</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>hcp_count</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>53.00</td>\n",
" <td>120.00</td>\n",
" <td>11.00</td>\n",
" <td>157.00</td>\n",
" <td>boehringer</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>hco_count</td>\n",
" <td>4.00</td>\n",
" <td>17.00</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>3.00</td>\n",
" <td>1.00</td>\n",
" <td>22.00</td>\n",
" <td>boehringer</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>rnd</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>6505476.67</td>\n",
" <td>boehringer</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" type donations_grants sponsorship registration_fees \\\n",
"0 hcp_amount NaN NaN NaN \n",
"1 hco_amount 1135965.56 654838.18 NaN \n",
"2 hcp_count NaN NaN NaN \n",
"3 hco_count 4.00 17.00 NaN \n",
"4 rnd NaN NaN NaN \n",
"\n",
" travel_accommodation fees related_expenses total source \n",
"0 79006.65 138437.24 10447.12 227891.01 boehringer \n",
"1 NaN 11761.53 673.50 1803238.77 boehringer \n",
"2 53.00 120.00 11.00 157.00 boehringer \n",
"3 NaN 3.00 1.00 22.00 boehringer \n",
"4 NaN NaN NaN 6505476.67 boehringer "
]
},
"execution_count": 67,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_export.head()"
]
},
{
"cell_type": "code",
"execution_count": null,
Expand Down
10 changes: 6 additions & 4 deletions data/1. pdfexport/files/Jansen-Cilag/1. Accumulations.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 9,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -105,20 +105,22 @@
"# RnD\n",
"df_export = df_export.append({\n",
" 'type':'rnd',\n",
" 'donations_grants': '',\n",
" 'donations_grants': '1 291 864 CHF',\n",
" 'sponsorship': '',\n",
" 'registration_fees': '',\n",
" 'travel_accommodation': '',\n",
" 'fees': '',\n",
" 'related_expenses': '',\n",
" 'total': '1 291 864 CHF'\n",
" 'related_expenses': ''\n",
" }, ignore_index=True)\n",
"\n",
"#Numberize and sum\n",
"df_export = cleanup_number(df_export)\n",
"df_export = amounts_to_number(df_export)\n",
"df_export = sum_amounts(df_export)\n",
"\n",
"#Fix RnD\n",
"df_export.loc[df_export.type == 'rnd', 'donations_grants'] = np.NaN\n",
"\n",
"export_acumulations(df_export, 'jansen')"
]
},
Expand Down
6 changes: 4 additions & 2 deletions data/1. pdfexport/files/MSD/1. Accumulations.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# MSD - Accumulations"
"# MSD - Accumulations\n",
"## Info\n",
"* Total in RnD ist Gesammttotal des PDFs!"
]
},
{
Expand Down Expand Up @@ -119,7 +121,7 @@
"df_export.iloc[3,0] = \"hco_count\"\n",
"df_export.iloc[4,0] = \"rnd\"\n",
"\n",
"df_export.loc[df_export.type == 'rnd', \"total\"] = df_export.loc[df_export.type == 'rnd', \"sponsorship\"] + df_export.loc[df_export.type == 'rnd', \"donations_grants\"]\n",
"df_export.loc[df_export.type == 'rnd', \"total\"] = df_export.loc[df_export.type == 'rnd', \"donations_grants\"]\n",
"df_export.loc[df_export.type == 'rnd', \"sponsorship\"] = np.NaN\n",
"df_export.loc[df_export.type == 'rnd', \"donations_grants\"] = np.NaN\n",
"\n",
Expand Down
Binary file modified data/1. pdfexport/files/MSD/unlocked.pdf
Binary file not shown.
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Sandoz - List\n",
"# Sandoz - List - TODO\n",
"## Beachten\n",
"* Für die OCR-Erkennung wurde ABBYY Fine Reader for Mac verwendet (Lizenz vorhanden)\n",
"* Wichtig: Sprachen einstellen auf: Deutsch, Englisch, Italienisch, Französisch\n",
Expand Down

0 comments on commit df0c7e7

Please sign in to comment.