DOCS

furkanmtorun · Jul 8, 2022 · b362745 · b362745
1 parent 1726931
commit b362745
Show file tree

Hide file tree

Showing 31 changed files with 546 additions and 12 deletions.
diff --git a/README.md b/README.md
@@ -1,4 +1,4 @@
-<p align="center"> <img src="https://user-images.githubusercontent.com/49681382/101802266-48204a00-3b20-11eb-85ec-08c123fca79e.png" height="270" width="277" /> </p>
+<p align="center"> <img src="omiclearn.png" height="270" width="277" /> </p>
 <h2 align="center"> 📰 Manual and Documentation is available at: <a href="https://github.com/MannLabs/OmicLearn/wiki" target="_blank">OmicLearn Wiki Page </a> </h2>
 
 ![OmicLearn Tests](https://github.com/MannLabs/OmicLearn/workflows/OmicLearn%20Tests/badge.svg)
@@ -10,7 +10,7 @@
 ---
 ## OmicLearn
 
-Transparent exploration of machine learning for biomarker discovery from proteomics and omics data. This is a maintained fork from https://github.com/OmicEra/OmicLearn.
+Transparent exploration of machine learning for biomarker discovery from proteomics and omics data. This is a maintained fork from [OmicEra](https://github.com/OmicEra/OmicLearn).
 
 
 ## Manuscript
@@ -57,7 +57,7 @@ Click on one of the links below to download the latest release for:
 
 The following image displays the main steps of OmicLearn:
 
-![OmicLearn Workflow](https://user-images.githubusercontent.com/49681382/91734594-cb421380-ebb3-11ea-91fa-8acc8826ae7b.png)
+![OmicLearn Workflow](workflow.png)
 
 Detailed instructions on how to get started with OmicLearn can be found **[here.](https://github.com/MannLabs/OmicLearn/wiki/HOW-TO:-Using)**
 
@@ -70,4 +70,4 @@ All contributions are welcome. 👍
 
 When contributing to **OmicLearn**, please **[open a new issue](https://github.com/MannLabs/OmicLearn/issues/new/choose)** to report the bug or discuss the changes you plan before sending a PR (pull request).
 
-We appreciate community contributions to the repository., we ensure that the community is free to use your contributions.  🤝
+We appreciate community contributions to the repository.
diff --git a/docs/Makefile b/docs/Makefile
@@ -0,0 +1,20 @@
+# Minimal makefile for Sphinx documentation
+#
+
+# You can set these variables from the command line, and also
+# from the environment for the first two.
+SPHINXOPTS    ?=
+SPHINXBUILD   ?= sphinx-build
+SOURCEDIR     = source
+BUILDDIR      = build
+
+# Put it first so that "make" without argument is like "make help".
+help:
+	@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
+
+.PHONY: help Makefile
+
+# Catch-all target: route all unknown targets to Sphinx using the new
+# "make mode" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).
+%: Makefile
+	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
diff --git a/docs/make.bat b/docs/make.bat
@@ -0,0 +1,35 @@
+@ECHO OFF
+
+pushd %~dp0
+
+REM Command file for Sphinx documentation
+
+if "%SPHINXBUILD%" == "" (
+	set SPHINXBUILD=sphinx-build
+)
+set SOURCEDIR=source
+set BUILDDIR=build
+
+if "%1" == "" goto help
+
+%SPHINXBUILD% >NUL 2>NUL
+if errorlevel 9009 (
+	echo.
+	echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
+	echo.installed, then set the SPHINXBUILD environment variable to point
+	echo.to the full path of the 'sphinx-build' executable. Alternatively you
+	echo.may add the Sphinx directory to PATH.
+	echo.
+	echo.If you don't have Sphinx installed, grab it from
+	echo.https://www.sphinx-doc.org/
+	exit /b 1
+)
+
+%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
+goto end
+
+:help
+%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
+
+:end
+popd
diff --git a/docs/source/METHODS.md b/docs/source/METHODS.md
diff --git a/docs/source/README.rst b/docs/source/README.rst
@@ -0,0 +1 @@
+.. include:: ../../../README.md
diff --git a/docs/source/USING.md b/docs/source/USING.md
@@ -0,0 +1,147 @@
+## Using OmicLearn
+**OmicLearn** enables researchers and scientists to explore the latest algorithms in machine learning (ML) for their usage in proteomics/transcriptomics.
+
+The core steps of the pipeline are  `Preprocessing`, `Missing Value Imputation`, `Feature Selection`, `Classification`, and `Validation` of selected method/algorithms and are presented in the flowchart below:
+
+![OmicLearn Workflow](workflow.png)
+
+_**Figure 1:** Main steps for the workflow of OmicLearn at a glance_
+
+## Uploading data
+
+Own data can be uploaded via dragging and dropping on the file menu or clicking the link.
+The data should be formatted according to the following conventions:
+
+> - The file format should be `.xlsx (Excel)`, `.csv (Comma-separated values)` or `.tsv (tab-separated values)`.  For `.csv`, the separator should be either `comma (,)` or `semicolon (;)`.
+>
+> - Maximum file size is 200 Mb.
+>
+> - 'Identifiers' such as protein IDs, gene names, lipids or miRNA IDs should be uppercase.
+>
+> - Each row corresponds to a sample, each column to a feature.
+>
+> - Additional features should be marked with a leading underscore (`_`).
+
+![DATA_UPLOAD/SELECTION](upload.png)
+
+_**Figure 2:** Uploading a dataset or selecting a sample file_
+
+The data will be checked for consistency, and if your dataset contains missing values (`NaNs`), a notification will appear.
+Then, you might consider using the methods listed on the left sidebar for the imputation of missing values.
+
+![NAN_WARNING](nan_warning.png)
+
+_**Figure 3:** Missing value warning_
+
+
+### Sample Datasets
+
+OmicLearn has several sample [datasets](https://github.com/MannLabs/OmicLearn/tree/master/data) included that can be used for exploring the analysis, which can be selected from the dropdown menu.
+
+Here is the list of sample datasets available:
+
+**`1. Alzheimer Dataset`**
+> 📁 **File Name:** Alzheimer.xlsx
+>
+> 📖 **Description:** Proteome profiling in cerebrospinal fluid reveals novel biomarkers of Alzheimer's disease
+>
+> 🔗 **Source:** Bader, J., Geyer, P., Müller, J., Strauss, M., Koch, M., & Leypoldt, F. et al. (2020). Proteome profiling in cerebrospinal fluid reveals novel biomarkers of Alzheimer's disease. Molecular Systems Biology, 16(6). doi: [10.15252/msb.20199356](http://doi.org/10.15252/msb.20199356).
+
+**`2. Sample Dataset`**
+> 📁 **File Name:** Sample.xlsx
+>
+> 📖 **Description:** Sample dataset for testing the tool
+>
+> 🔗 **Source:** -
+
+## Sidebar: Selecting Parameters
+
+OmicLearn has a large variety of options to choose from which are detailed in the [methods wiki](https://github.com/MannLabs/OmicLearn/wiki/METHODS).  The parameters can be selected in the sidebar.
+
+Moreover, after changing the parameters, you are asked to re-run the analysis. Each analysis result will be stored in the [`Session History` section](#checking-the-session-history).
+
+![OmicLearn SideBar](sidebar.png)
+
+_**Figure 4:** OmicLearn sidebar options_
+
+## Main Window: Selecting data, define workflow, and explore results
+
+### Data Selection
+
+After uploading the data, the data will be displayed within the OmicLearn window and can be explored. The dropdown menu `Subset` allows you to specify a subset of data based on values within a column. This way, you can exclude data that should not be used at all. An example use case could be that you collected data from different sites and want to exclude a site.
+
+![Subset](subset.png)
+
+_**Figure 5:** Example usage for `Subset` section_
+
+Within `Features`, you should select the target column. This refers to the variable that the classifier should be able to distinguish. As we are performing a binary classification task, there are only two options for the outcome of the classifier. By assigning multiple values to a class, multiple combinations of classifications can be tested.
+
+![Classification target](target.png)
+
+_**Figure 6:** `Classification target` section for selecting the target columns and `Define classes` section for assigning the classes_
+
+Furthermore, `Additional Features` can be selected. This refers to columns that are not your identifiers such as protein IDs, gene names, lipids or miRNA IDs (not uppercase and have a leading underscore (`_`).
+
+![Add Features](additional.png)
+
+_**Figure 7:** Sample case for `Additional Features` option_
+
+The section `Exclude identifiers` enables users to exclude selected features manually. This can be useful e.g., when wanting to asses performance without a top feature. There is also an option to upload a file with multiple features that should be excluded.
+
+> To utilize this option, you should upload a CSV (comma `,` separated) file where each row corresponds to a feature to be excluded. Also, the file should include a header (title row).
+
+![exclude_identifiers](exclude.png)
+
+_**Figure 8:** Selections on the dataset_
+
+The option `Cohort comparison` allows comparing results over different cohorts (i.e., train on one cohort and predict on another)
+
+![dataselections](selection.png)
+
+_**Figure 9:** Selections on the dataset_
+
+### Running the Workflow
+After selecting all parameters you are able to execute the workflow by clicking the `Run Analysis` button.
+
+### Analysis results and plots
+Once the analysis is completed, OmicLearn automatically generates the plots together with a table showing the results of each validation run. The plots are downloadable as `.pdf` and `.svg` format in addition to the `.png` format provided by Plotly.
+
+![FeatAtt_Chart](feature_importance.png)
+
+![FeatAtt_Table](feature_importance_table.png)
+
+_**Figure 10:** Bar chart for feature importance values received from the classifier after all cross-validation runs, its table containing links to NCBI search and download options_
+
+![ROC Curve](roc_curve.png)
+
+![PR Curve](pr_curve.png)
+
+_**Figure 11:** Receiver operating characteristic (ROC) Curve, Precision-Recall (PR) Curve and download options_
+
+![CONF-MATRIX](confusion.png)
+
+_**Figure 12:** Confusion matrix, slider for looking at the other matrix tables and download options_
+
+OmicLearn generates a `Summary` to describe the method. This can be used for a method section in a publication.
+
+![Results table](summary.png)
+
+![summary text](summary_text.png)
+
+_**Figure 13:** Results table of the analysis, its download option, and auto-generated `Summary` text_
+
+### Checking the Session History
+
+Each analysis run will be appended to the `Session History` so that you can investigate the different results for different parameter sets.
+
+![session](session_history.png)
+
+_**Figure 14:** Session history table and download option_
+
+## Cite us & Report bugs
+
+At the end of the analysis, you find a footnote for reporting bugs. Also, there is information on how to cite OmicLearn in your work if you find it useful.
+
+![bug_report](bugs.png)
+
+_**Figure 15:** Tabs for Citation and Bug Reporting_
diff --git a/docs/source/VERSION-HISTORY.md b/docs/source/VERSION-HISTORY.md
@@ -0,0 +1,58 @@
+## Version History
+
+On this page, you might find the list of the previous releases of **OmicLearn** and the notes and significant changes made within the versions.
+
+### - `v1.1.3`
+
+> 📅  June 2022
+>
+> This is the latest release of **OmicLearn**.
+>
+> **Updates in this release:**
+>
+> - [x] Several packages are upgraded to their latest version!
+> - [x] The code prettifier and formatter are used to make it more readable!
+> - [x] One-click installers
+>
+
+### - `v1.1.2`
+
+> 📅  February 2022
+>
+>
+> **Updates in this release:**
+>
+> - [x] TSV (Tab [\t] Separated Value) File support has been added!
+>
+
+### - `v1.1.1`
+
+> 📅  October 2021
+>
+>
+> **Updates in this release:**
+>
+> - [x] The dependencies are upgraded to the latest versions if possible.
+> - [x] The codebase has been updated for UI changes in Streamlit.
+>
+
+
+### - `v1.1.0`
+
+> 📅 July 2021
+>
+>
+> **Updates in this release:**
+> - [X] As Exploratory Data Analysis (EDA) is crucial to gain insight into the data,  PCA and Hierarchical clustering options are provided.
+> - [X] Improvements on UI & UX have been made.
+> - [X] The user interface has been updated to make it more easy-to-follow.
+> - [X] The Streamlit and other packages/libraries are upgraded to their new versions.
+>
+
+<br>
+
+### - `v1.0.0`
+
+> 📅 March 2021
+>
+> This is the initial release of **OmicLearn**.
diff --git a/docs/source/additional.png b/docs/source/additional.png
diff --git a/docs/source/bugs.png b/docs/source/bugs.png
diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -0,0 +1,54 @@
+# Configuration file for the Sphinx documentation builder.
+#
+# This file only contains a selection of the most common options. For a full
+# list see the documentation:
+# https://www.sphinx-doc.org/en/master/usage/configuration.html
+
+# -- Path setup --------------------------------------------------------------
+
+# If extensions (or modules to document with autodoc) are in another directory,
+# add these directories to sys.path here. If the directory is relative to the
+# documentation root, use os.path.abspath to make it absolute, like shown here.
+#
+# import os
+# import sys
+# sys.path.insert(0, os.path.abspath('.'))
+
+
+# -- Project information -----------------------------------------------------
+
+project = 'OmicLearn'
+copyright = '2022, Furkan Torun, Maximilian Strauss'
+author = 'Furkan Torun, Maximilian Strauss'
+
+# The full version, including alpha/beta/rc tags
+release = '1.1.3'
+
+
+# -- General configuration ---------------------------------------------------
+
+# Add any Sphinx extension module names here, as strings. They can be
+# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
+# ones.
+extensions = ['myst_parser']
+
+# Add any paths that contain templates here, relative to this directory.
+templates_path = ['_templates']
+
+# List of patterns, relative to source directory, that match files and
+# directories to ignore when looking for source files.
+# This pattern also affects html_static_path and html_extra_path.
+exclude_patterns = []
+
+
+# -- Options for HTML output -------------------------------------------------
+
+# The theme to use for HTML and HTML Help pages.  See the documentation for
+# a list of builtin themes.
+#
+html_theme = 'sphinx_rtd_theme'
+
+# Add any paths that contain custom static files (such as style sheets) here,
+# relative to this directory. They are copied after the builtin static files,
+# so a file named "default.css" will overwrite the builtin "default.css".
+html_static_path = ['_static']
diff --git a/docs/source/confusion.png b/docs/source/confusion.png
diff --git a/docs/source/exclude.png b/docs/source/exclude.png
diff --git a/docs/source/feature_importance.png b/docs/source/feature_importance.png
diff --git a/docs/source/feature_importance_table.png b/docs/source/feature_importance_table.png
diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -0,0 +1,24 @@
+.. OmicLearn documentation master file, created by
+   sphinx-quickstart on Fri Jul  8 23:02:10 2022.
+   You can adapt this file completely to your liking, but it should at least
+   contain the root `toctree` directive.
+
+Welcome to OmicLearn's documentation!
+=====================================
+
+.. toctree::
+   :maxdepth: 2
+   :caption: Contents:
+
+   README.rst
+   USING.md
+   METHODS.md
+   VERSION-HISTORY.md
+
+
+Indices and tables
+==================
+
+* :ref:`genindex`
+* :ref:`modindex`
+* :ref:`search`
diff --git a/docs/source/nan_warning.png b/docs/source/nan_warning.png
diff --git a/docs/source/pr_curve.png b/docs/source/pr_curve.png
diff --git a/docs/source/roc_curve.png b/docs/source/roc_curve.png
diff --git a/docs/source/selection.png b/docs/source/selection.png
diff --git a/docs/source/session_history.png b/docs/source/session_history.png
diff --git a/docs/source/sidebar.png b/docs/source/sidebar.png
diff --git a/docs/source/subset.png b/docs/source/subset.png
diff --git a/docs/source/summary.png b/docs/source/summary.png
diff --git a/docs/source/summary_text.png b/docs/source/summary_text.png
diff --git a/docs/source/target.png b/docs/source/target.png
diff --git a/docs/source/upload.png b/docs/source/upload.png
diff --git a/docs/source/workflow.png b/docs/source/workflow.png
diff --git a/docs/source/workflow0.png b/docs/source/workflow0.png
diff --git a/omiclearn.png b/omiclearn.png
diff --git a/reqs.txt b/reqs.txt
@@ -1,11 +1,11 @@
-streamlit==1.8.1
-pandas==1.4.2
-numpy==1.22.3
-scikit-learn==1.0.2
-openpyxl==3.0.9
-plotly==5.7.0
+streamlit==1.10.0
+pandas==1.4.3
+numpy==1.23.0
+scikit-learn==1.1.1
+openpyxl==3.0.10
+plotly==5.9.0
 kaleido==0.2.1
-XlsxWriter==3.0.1
-watchdog==2.1.7
+XlsxWriter==3.0.3
+watchdog==2.1.9
 xgboost
 protobuf==3.20
diff --git a/workflow.png b/workflow.png