-
Notifications
You must be signed in to change notification settings - Fork 40
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
271 changed files
with
29,331 additions
and
4,519 deletions.
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
Binary file added
BIN
+9.68 KB
...ml/_images/08d49036540c9957200d6f34395f42b7d1a310ea09bbbaa72b27c489b812c8df.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+189 KB
...ml/_images/12ef10bb95d3d9ed719ab5cdb123e4b00a052cdc69e6271b106b89911f3b26ea.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+12.8 KB
...ml/_images/18eccb808d0ea7fc5221f7f10beb33e614f64f97afdc7afef10684b806ab6409.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+953 KB
...ml/_images/1bdd01b41fc4597e7129861b636621f708fee81092c326715fdd465e4a35dc6d.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+28.1 KB
...ml/_images/23c4f4af3ea379192ab719887e9a05d2a969c5f90f678b8a42e705f1d169faaf.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+128 KB
...ml/_images/25f034102d3d04ff4239e28c7319b72a69dc66e85abc2112f0e27a4b180b1a5b.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+13.2 KB
...ml/_images/2b452bb2df37608399530dd3cd68d42be053efe8a1667fbc183c81b0d4ef4384.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+54.5 KB
...ml/_images/33efd20978a31d42a63646b991a35707a7eeed1f0c2b74fe9d7a6f02f30c2e7d.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+66.2 KB
...ml/_images/3ee7306422c779b6659811cc21ec829613b2eab99bfacc2bd23c9462a6cbdd57.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+81.7 KB
...ml/_images/444f4db53e7beaf1960b89b5249eb8beb92a83d724bba94fdbb5d643a14f454b.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+30.7 KB
...ml/_images/46ea1cd4b599783df8b40c800411b04ce464d2f47dd180c9d582bd18a1e4e329.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+855 KB
...ml/_images/56b195b136d2857a98349d9834dc3b31c1848316203cf5ccc9702d69c875105a.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+9.66 KB
...ml/_images/5781ede4f7cb367a1d08e2441fe75764b847ad8f43db8ccb40448d6b6e2e0997.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+10.6 KB
...ml/_images/5f9d4cc5d85537d619cc74ad5d45719d1ad9e1b6158c2c6f50ec1ab816818616.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+54.5 KB
...ml/_images/6b18b1ea09ffd343385166ccd572b6fb78201175afc62b4bd7a9313a4301494a.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+20.5 KB
...ml/_images/70ce7630b7dfcc4381b44e88a45c47d1d9ecbde75c7fc612bf348b363ac1ee32.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+21.3 KB
...ml/_images/71aaee0c789e2352bc16a2611a74e0bf52d4c4d3e763d62291e4ee33ace1c1f4.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+2.47 KB
...ml/_images/761c57da56fc2f406bb6487cad7620ad6b952cf7aa8f26c4349f7641c886613a.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+16.5 KB
...ml/_images/7b6d84c9e893082bb708830ee867d64ba5bcdb021ce1b32dfb4d0b2111e046cc.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+67.4 KB
...ml/_images/832b75ae66bf569b0202848a039aef7c090cba6e1d3d8100705a667dc14f70c6.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+28.5 KB
...ml/_images/87105c30549676f9eee5de44a3ce4b33d158474b2f2af7b94bb4ea09c5c0aa06.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+24.3 KB
...ml/_images/89d3e50c8ed63a00b4a44f295def33051357b7098e1936e885d190b8a061c319.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+114 KB
...ml/_images/8e3984e3ef0e94f3899c2392444747bca48a8843e14cbff9c0788545abcb815b.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+110 KB
...ml/_images/97c273f817f7679d4622eb0573b35c1de35c4b1be4a541eda3124f0bb3bb464b.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+16.4 KB
...ml/_images/995ce92628a22b641900518889b0ca04d82329a6e8c5a7d7cb7016e8358ccbbc.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+206 KB
...ml/_images/9df14ddc8212834ad0c1fb474f33b2b517c498eb7780833d6d584e453c9c2936.png
Oops, something went wrong.
Binary file added
BIN
+12.7 KB
...ml/_images/9f39a0db4db2401abed4e47e79862cb82701b1f3ab4bc90611728b182f0e6d0a.png
Oops, something went wrong.
Binary file added
BIN
+83.2 KB
...ml/_images/a09b683b84921912029eb7a5ec054c54cb08d92f0f50a3be4c97a03443977174.png
Oops, something went wrong.
Binary file added
BIN
+20.7 KB
...ml/_images/b715e5b3a491d16c483a7a4f05fb1677820477e340a53c52de5fe9d506d92f4a.png
Oops, something went wrong.
Binary file added
BIN
+12.2 KB
...ml/_images/bb54823e7a5563ab618c6e19cf6c6889d9d865d49459c7947c9c4d1541c10db7.png
Oops, something went wrong.
Binary file added
BIN
+111 KB
...ml/_images/bee3cf0a11cd762de324c61ffc6b6ce8a5ebbc059a75b4864bc998e7a4c14d41.png
Oops, something went wrong.
Binary file added
BIN
+24.2 KB
...ml/_images/c0fde4b77fd3b890d1b7bfb086bbae86550f43d34ab480b81895d488a034aeed.png
Oops, something went wrong.
Binary file added
BIN
+55.9 KB
...ml/_images/c12b0a5113c352c4d39f837d07c32dcdac8cb93694026c7ba11108a7986daee5.png
Oops, something went wrong.
Binary file added
BIN
+22.4 KB
...ml/_images/c12b9bbfd5f7e79bd15929a8fd9eabadd5fba7c7e46a6d5f8daea66885db1de6.png
Oops, something went wrong.
Binary file added
BIN
+128 KB
...ml/_images/ccb623afb61d0ee82c47b75cb21bfe7d24c155a8734ed7cc85d358599903e7b2.png
Oops, something went wrong.
Binary file added
BIN
+23.6 KB
...ml/_images/de61acf9e7585e77081e483114436fa3364da9e96f0ea8af6fb3021a8b410b55.png
Oops, something went wrong.
Binary file added
BIN
+57.2 KB
...ml/_images/e29f6036ba2adf73d7306448a52274410a5ca6ed408aa1677db7f42047163323.png
Oops, something went wrong.
Binary file added
BIN
+9.05 KB
...ml/_images/e583ea0ad9417e6aee90043d6326244d9d991d113ea7f6d7e84f2f519d6c4356.png
Oops, something went wrong.
Binary file added
BIN
+13.6 KB
...ml/_images/e8dcdd0a61f2081d3ede20553e2f95abfbf65359ef4b67e507db942192f72e0e.png
Oops, something went wrong.
Binary file added
BIN
+189 KB
...ml/_images/ea90a58d8d7587247bee01846740b3a8d5c00ea2fb8c280ca9b5b0de20401a11.png
Oops, something went wrong.
Binary file added
BIN
+856 KB
...ml/_images/eaaf0233d2c9c6cef30ffff796ea6c970fde79de4fb09613f47b63014476ad87.png
Oops, something went wrong.
Binary file added
BIN
+11.1 KB
...ml/_images/eb0859327f0fd6d53adb2fec5dddfa0c97ed4947ee38d4ea70cde7c63c294e9e.png
Oops, something went wrong.
Binary file added
BIN
+24.7 KB
...ml/_images/ee2d8ebe7f9e943c9e78604083da8984496fbfecd2e1fceec6ae42f36ee81057.png
Oops, something went wrong.
Binary file added
BIN
+1.92 KB
...ml/_images/f21f7157f9e53e328a39c9c2930883daa1d7237f9fe5e9854c0e975475e09ee2.png
Oops, something went wrong.
Binary file added
BIN
+114 KB
...ml/_images/f6ae63ff6378e80a760c3a7cf2ef4f664f17fd69942f1bec4501552036c33fc9.png
Oops, something went wrong.
Binary file added
BIN
+9.08 KB
...ml/_images/f9c1f3f801cbd76c970e2a47aa555d983cc493d6f1621521253bbf9657d16ae2.png
Oops, something went wrong.
964 changes: 964 additions & 0 deletions
964
book/_build/html/_sources/convnet_classifier_pytorch.ipynb
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
{"cells":[{"cell_type":"markdown","metadata":{"id":"cbqX9TMrgYpE"},"source":["# Working with data"]}],"metadata":{"kernelspec":{"name":"python3","language":"python","display_name":"Python 3"},"colab":{"provenance":[]}},"nbformat":4,"nbformat_minor":0} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
{"cells":[{"cell_type":"markdown","metadata":{"id":"0zHXSDQI6FyT"},"source":["<a href=\"https://colab.research.google.com/github/smart-stats/ds4bio_book/blob/main/book/data_advanced_databases.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>\n","\n","# HDF5\n","\n","You've probably already learned about some variation of databases\n","either sql, nosql, spark, a cloud db, ... We covered sqlite last chapter. Often,\n","the backend of these databases can be quite complicated, while the\n","front end requires SQL querries or something similar. We'll look at a\n","non-relational database format that is specifically useful for\n","scientific computing called hdf5. HDF5 has implementations in many\n","languages, but we'll look at python. This is a hierarchical data\n","format specifically useful for large array calculations.\n","\n","Let's create a basic h5py file. First, let's load our stuff."]},{"cell_type":"code","metadata":{"id":"9ZtaYNrj6Fyb"},"source":["import numpy as np\n","import h5py"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"p5yrVspO6Fye"},"source":["Now, let's create an empty hdf5 file. Here's the basic code; the\n","option `w` is open for writing. There's also `w-`, `r`, `r+`, `a` for\n","write protected, read only, read/write, read/write and create. The\n","first time I ran it I used:"]},{"cell_type":"code","metadata":{"id":"C49xsIVf6Fyf"},"source":["f = h5py.File('sensor.hdf5', 'w')"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"39jxCNSm6Fyf"},"source":["Then, subsequently"]},{"cell_type":"code","metadata":{"id":"EtYpCqUU6Fyg"},"source":["#| eval: false\n","f = h5py.File('sensor.hdf5', 'r+')"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"ZjxFh5J76Fyh"},"source":["Now let's populate it with some data. The hdf5 file works almost like\n","a directory where we can store hierarchical data. For example, suppose\n","that we want sensors stored in a superstructure called `sensors` and\n","want to fill in the data for `sensor1` and `sensor1`."]},{"cell_type":"code","metadata":{"id":"FvFPY3oU6Fyh"},"source":["f['sensors/sensor1'] = np.random.normal(size = 1024)\n","f['sensors/sensor2'] = np.random.normal(size = 1024)"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"VjI8JM416Fyi"},"source":["Now we can do normal `np` stuff on this sensor. However, hdf5 is only\n","bringing in the part that we are using into memory. This allows us to\n","work with very large files. Also, as we show here, you can name the\n","data to a variable since that's more convenient."]},{"cell_type":"code","metadata":{"id":"sqLuWexJ6Fyj"},"source":["s1 = f['sensors/sensor1']\n","s2 = f['sensors/sensor2']"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"hxbwpN256Fyk"},"source":["## Blockwise basic statistical calculations\n","Now, consider taking the mean of both variables. Imagine that the time\n","series is so long it's not feasible to load into memory. So, we want\n","to read it in blocks. You want your blocks to be as big as possible,\n","since that's fastest. In our case, of course, none of this is\n","necessary.\n","\n","Our goal in this section is to do the following: calculate the\n","empirical mean and variance for each sensor, center and scale each\n","sensor, and write those changes to those variables, calculate the\n","sample correlation then calculate the residual for sensor1 given\n","sensor2. (I think typically you wouldn't want to overwrite the\n","original data; but, this is for pedagogical purposes.) We want our\n","data organized so sensors are stored in a hierarchical \"folder\" called\n","sensors and processed data is in a different folder.\n","\n","We're just simulating iid standard normals. So, we have a rough idea\n","of the answers we should get, since the the data are theoretically\n","mean 0, variance 1 and uncorrelated. After our calculations, they will\n","have empirical mean 0 and variance 1 and the empirical correlation\n","between the residual and sensor 2 will be 0.\n","\n","Let's consider a block variation of the inner product.\n","\n","$$\n","<a, b> = \\sum_{i=0}^{n-1} a_i b_i = \\sum_{i=0}^{n/B} \\sum_{j=0}^{B-1} a_{j + i B} b_{j + i B}\n","$$\n","\n","(if $n$ is divisible by $B$. Otherwise you have to figure out what to do with the final block, which isn't hard but makes the notation messier.) So, for example, the (sample) mean is then $<x, J>/n$ where $J$ is a vector of ones.\n","\n","Let's calculate the mean using blockwise calculations."]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"Jwe-4O4S6Fyk","executionInfo":{"status":"ok","timestamp":1713271323038,"user_tz":240,"elapsed":138,"user":{"displayName":"Brian Caffo","userId":"07979705296072332292"}},"outputId":"cc10a0ce-0dcc-455d-8af3-600582a336ba"},"source":["n = s1.shape[0]\n","B = 32\n","## mean center the blocks\n","mean1 = 0\n","mean2 = 0\n","for i in range(int(n/B)):\n"," block_indices = np.array(range(B)) + i * B\n"," mean1 += s1[block_indices].sum() / n\n"," mean2 += s2[block_indices].sum() / n\n","\n","[mean1, mean2]"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":["[-0.03648099576614891, -0.01236374986065932]"]},"metadata":{},"execution_count":13}]},{"cell_type":"markdown","metadata":{"id":"r7F1ZOfZ6Fyl"},"source":["Let's now center our time series."]},{"cell_type":"code","metadata":{"id":"ptxnou_86Fyl"},"source":["for i in range(int(n/B)):\n"," block_indices = np.array(range(B)) + i * B\n"," s1[block_indices] -= mean1\n"," s2[block_indices] -= mean2"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"NIi1UXGz6Fym"},"source":["Now the (unbiased, sample) variance of centered vector $a$ is simply $<a, a>/(n-1)$."]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"ESUei8em6Fym","executionInfo":{"status":"ok","timestamp":1713271327058,"user_tz":240,"elapsed":14,"user":{"displayName":"Brian Caffo","userId":"07979705296072332292"}},"outputId":"13a0f842-f849-44bc-bf5d-6a5ae4ef5f9d"},"source":["v1, v2 = 0, 0\n","for i in range(int(n/B)):\n"," block_indices = np.array(range(B)) + i * B\n"," v1 += np.sum(s1[block_indices] ** 2) / (n - 1)\n"," v2 += np.sum(s2[block_indices] ** 2) / (n - 1)\n","[v1, v2]"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":["[0.9796337756284553, 0.9316733839552979]"]},"metadata":{},"execution_count":15}]},{"cell_type":"markdown","metadata":{"id":"vmNqIcEm6Fym"},"source":["Now let's scale our vectors as"]},{"cell_type":"code","metadata":{"id":"8ViKBZuJ6Fyn"},"source":["sd1 = np.sqrt(v1)\n","sd2 = np.sqrt(v2)\n","for i in range(int(n/B)):\n"," block_indices = np.array(range(B)) + i * B\n"," s1[block_indices] /= v1\n"," s2[block_indices] /= v2"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"z7st-8Cr6Fyn"},"source":["Now that our vectors are centered and scaled, the empirical correlation is simply $<a, b>/(n-1)$. Let's do that"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"bbUmONqb6Fyo","executionInfo":{"status":"ok","timestamp":1713271330791,"user_tz":240,"elapsed":192,"user":{"displayName":"Brian Caffo","userId":"07979705296072332292"}},"outputId":"96654807-2113-4e4d-fd02-31b9c5177680"},"source":["cor = 0\n","for i in range(int(n/B)):\n"," block_indices = np.array(range(B)) + i * B\n"," cor += np.sum(s1[block_indices] * s2[block_indices]) / (n-1)\n","cor"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":["-0.020708540657507272"]},"metadata":{},"execution_count":17}]},{"cell_type":"markdown","metadata":{"id":"OkhQ_8zB6Fyo"},"source":["Finally, we want to \"regress out\" s2 from s1. Since we normalized our series, the correlation is slope coefficient from linear regression (regardless of the outcome and dependent variable) and the intercept is zero (since we centered). Thus, the residual we want is $e_{12} = s_1 - \\rho s_2$ where $\\rho$ is the correlation."]},{"cell_type":"code","metadata":{"id":"a7pGkt_V6Fyp"},"source":["f['processed/resid_s1_s2'] = np.empty(n)\n","e12 = f['processed/resid_s1_s2']\n","for i in range(int(n/B)):\n"," block_indices = np.array(range(B)) + i * B\n"," e12[block_indices] += s1[block_indices] - cor * s2[block_indices]"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"vU9ZFzqn6Fyq"},"source":["Now we have our new processed data stored in a vector. To close our\n","database simply do:"]},{"cell_type":"code","metadata":{"id":"FbDa-Pgd6Fyq"},"source":["f.close()"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"j361azRJ6Fyq"},"source":["Now our processed data is stored on disk."]},{"cell_type":"code","metadata":{"id":"2mxaoPSP6Fyr"},"source":["f = h5py.File('sensor.hdf5', 'r')\n","f['processed/resid_s1_s2']"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["We can close the database with the method `.close()` as follows."],"metadata":{"id":"B0pncZs19l3U"}},{"cell_type":"code","metadata":{"id":"m9pI9lOT6Fyr"},"source":["f.close()"],"execution_count":null,"outputs":[]}],"metadata":{"kernelspec":{"name":"python3","language":"python","display_name":"Python 3"},"colab":{"provenance":[]}},"nbformat":4,"nbformat_minor":0} |
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
{"cells":[{"cell_type":"markdown","metadata":{"id":"uEeRdv7HGf8d"},"source":["# Graphics"]}],"metadata":{"kernelspec":{"name":"python3","language":"python","display_name":"Python 3"},"colab":{"provenance":[]}},"nbformat":4,"nbformat_minor":0} |
Oops, something went wrong.