From 3a5daa3dc6f0873d9af0960d10754b448b18ed4f Mon Sep 17 00:00:00 2001 From: Maeve Murphy Quinlan Date: Mon, 8 Jul 2024 13:10:10 +0000 Subject: [PATCH 1/7] Adding slides for new version of swd3 --- index.qmd | 1 + swd3_2024.qmd | 24 ++++ swd3_2024/good-practices.qmd | 241 +++++++++++++++++++++++++++++++++++ swd3_2024/project.qmd | 183 ++++++++++++++++++++++++++ swd3_2024/sdlc.qmd | 163 +++++++++++++++++++++++ swd3_2024/structure.qmd | 64 ++++++++++ 6 files changed, 676 insertions(+) create mode 100644 swd3_2024.qmd create mode 100644 swd3_2024/good-practices.qmd create mode 100644 swd3_2024/project.qmd create mode 100644 swd3_2024/sdlc.qmd create mode 100644 swd3_2024/structure.qmd diff --git a/index.qmd b/index.qmd index 4476ecf..ee7799a 100644 --- a/index.qmd +++ b/index.qmd @@ -22,6 +22,7 @@ the link will be provided. - [ ] SWD1b: Introduction to R programming - [ ] SWD2: Version Control with Git and GitHub - [x] [SWD3: Software development practices for Research](./swd3.qmd) +- [x] [SWD3 2024: Software development practices for Research using Cloud](./swd3_2024.qmd) - [ ] SWD4: Cloud computing for Research - [ ] SWD5: Scientific Python - [ ] SWD6: High performance Python diff --git a/swd3_2024.qmd b/swd3_2024.qmd new file mode 100644 index 0000000..17b7ec1 --- /dev/null +++ b/swd3_2024.qmd @@ -0,0 +1,24 @@ +--- +format: + clean-revealjs: + self-contained: true + navigation-mode: linear + controls-layout: bottom-right + controls: false + footer: "[Research IT Website]({{< var rc.website >}}) | [Research IT Query]({{< var rc.servicedesk >}}) | [Courses Material]({{< var rc.material >}})" +name: Software development practices for Research +code: SWD3_2024 +--- + +{{< include _title.qmd >}} +{{< include _team.qmd >}} + +## Useful Links + +- [GitHub Project Demo](https://github.com/ARCTraining/swd3-demo) +- [Alan Turing Institute - Research Software Engineering Course Material](https://alan-turing-institute.github.io/rse-course/html/index.html) + +{{< include swd3_2024/sdlc.qmd >}} +{{< include swd3_2024/structure.qmd >}} +{{< include swd3_2024/good-practices.qmd >}} +{{< include swd3_2024/project.qmd >}} diff --git a/swd3_2024/good-practices.qmd b/swd3_2024/good-practices.qmd new file mode 100644 index 0000000..04c906d --- /dev/null +++ b/swd3_2024/good-practices.qmd @@ -0,0 +1,241 @@ +## Virtual Environments {.smaller} + +If application A needs version 1.0 of a particular module but application B +needs version 2.0, then the requirements are in conflict and installing either +version 1.0 or 2.0 will leave one application unable to run. + +The solution for this problem is to create a virtual environment, a +self-contained directory tree that contains installation for particular versions +of software/packages. + +### Conda + +- [Conda](https://docs.conda.io/en/latest/) is an open source package management +system and environment management system that runs on Windows, macOS, and Linux. +- It offers dependency and environment management for any language—Python, R, +Ruby, Lua, Scala, Java, JavaScript, C/ C++, Fortran, and more. +- Easy user install via [Anaconda](https://www.anaconda.com/download). + + +## Code formatting + +```python +# myscript.py: +x = { 'a':37,'b':42, +'c':927} +y = 'hello '+ 'world' +class foo ( object ): + def f (self ): + return y **2 + def g(self, x :int, + y : int=42 + ) -> int: + return x--y +def f ( a ) : + return 37+-a[42-a : y*3] +``` + +## Coding conventions {.smaller} + +If your language or project has a standard policy, use that. For example: + +- Python: [PEP8](https://www.python.org/dev/peps/pep-0008/) +- R: [Google's guide for R](https://google.github.io/styleguide/Rguide.xml), [tidyverse style guide](https://style.tidyverse.org/) +- C++: [Google's style guide](https://google.github.io/styleguide/cppguide.html) +- Julia: [Official style guide](https://docs.julialang.org/en/v1/manual/style-guide/index.html) + +## Linters + +Linters are automated tools which enforce coding conventions and check for +common mistakes. For example: + +- Python: + - [flake8](https://flake8.pycqa.org/en/latest/index.html) (flags any syntax/style errors) + - [black](https://black.readthedocs.io/) (enforces the style) + - [isort](https://pycqa.github.io/isort/) ("Sorts" imports alphabetically in groups) + +## Example: Flake8 Linter + +```bash +$ conda install flake8 +$ flake8 myscript.py +myscript.py:2:6: E201 whitespace after '{' +myscript.py:2:11: E231 missing whitespace after ':' +myscript.py:2:14: E231 missing whitespace after ',' +myscript.py:2:18: E231 missing whitespace after ':' +myscript.py:3:1: E128 continuation line under-indented for visual indent +myscript.py:3:4: E231 missing whitespace after ':' +myscript.py:4:13: E225 missing whitespace around operator +myscript.py:4:14: E222 multiple spaces after operator +myscript.py:5:1: E302 expected 2 blank lines, found 0 +myscript.py:5:13: E201 whitespace after '(' +myscript.py:5:25: E202 whitespace before ')' +myscript.py:6:4: E111 indentation is not a multiple of 4 +myscript.py:6:9: E211 whitespace before '(' +myscript.py:6:20: E202 whitespace before ')' +myscript.py:7:8: E111 indentation is not a multiple of 4 +myscript.py:7:14: E271 multiple spaces after keyword +myscript.py:7:25: E225 missing whitespace around operator +myscript.py:8:4: E301 expected 1 blank line, found 0 +myscript.py:8:4: E111 indentation is not a multiple of 4 +myscript.py:8:17: E203 whitespace before ':' +myscript.py:8:18: E231 missing whitespace after ':' +myscript.py:9:8: E128 continuation line under-indented for visual indent +myscript.py:9:9: E203 whitespace before ':' +myscript.py:9:15: E252 missing whitespace around parameter equals +myscript.py:9:16: E252 missing whitespace around parameter equals +myscript.py:10:8: E124 closing bracket does not match visual indentation +myscript.py:10:8: E125 continuation line with same indent as next logical line +myscript.py:11:8: E111 indentation is not a multiple of 4 +myscript.py:12:1: E302 expected 2 blank lines, found 0 +myscript.py:12:6: E211 whitespace before '(' +myscript.py:12:9: E201 whitespace after '(' +myscript.py:12:13: E202 whitespace before ')' +myscript.py:12:15: E203 whitespace before ':' +myscript.py:13:4: E111 indentation is not a multiple of 4 +myscript.py:13:10: E271 multiple spaces after keyword +myscript.py:13:26: E203 whitespace before ':' +myscript.py:13:34: W291 trailing whitespace +``` + +## Example: Black Code Formatter {.smaller} + +:::{.par_botton} +Install and run Black +::: +```bash +$ conda install black +$ black myscript.py +``` + +:::{.par_botton} +Check the file! +::: +```python +# myscript.py: +x = {"a": 37, "b": 42, "c": 927} +y = "hello " + "world" + + +class foo(object): + def f(self): + return y**2 + + def g(self, x: int, y: int = 42) -> int: + return x - -y + + +def f(a): + return 37 + -a[42 - a : y * 3] +``` + +## IDE {.smaller} + +Using an Integrated development environment (IDE) will certainly save you time, but the advantages of using an IDE go beyond that. Below are some IDE advantages + +1. Syntax highlighting +2. Text autocompletion +3. Refactoring options +4. Easily Importing libraries +5. Build, compile, or run + +### Visual Studio Code + +To install VS Code follow the instructions [here](https://code.visualstudio.com/). + +## VSC Example: automatically using black {.smaller} + +**Configure VSC to use Black**: Code (or File) > Preferences > Settings + +- Search for `python formatting provider` and choose `black` +- Search for `format on save` and check the box to enable + +**Select interpreter**: View > `Command Palette..` (or `Ctrl+Shift+P`) + +- Search for `Python: Select Interpreter` +- Choose the correct environment + +Now the Black package is going to fix your codes layout every time you save a +code file. + +## Version Control {.smaller} + +![[Piled Higher and Deeper by Jorge Cham](http://www.phdcomics.com)](./assets/img/git/phd101212s.png) + +## Test-driven development {.smaller} + +**Example**, suppose we need to find the result of a number divided by another number: + +::: {.panel-tabset} + +### Naive solution + +- Write a function a_div_b. +- Call it interactively on two or three different inputs. +- If it produces the wrong answer, fix the function and re-run that test. + +This clearly works — after all, thousands of scientists are doing it right now — but there’s a better way + +### TDD solution + +- Write a short function for each test. +- Write a `a_div_b` function that should pass those tests. +- If `a_div_b` produces any wrong answers, fix it and re-run the test functions. + +Writing the tests before writing the function they exercise is called **test-driven development (TDD)**. +Its advocates believe it produces better code faster because: + +- If people write tests after writing the thing to be tested, they are subject to confirmation bias, i.e., they subconsciously write tests to show that their code is correct, rather than to find errors. +- Writing tests helps programmers figure out what the function is actually supposed to do. + +::: + +## Possible tests: `a_div_b` example {.smaller} + +Let's think in all possible scenarios for this problem and how we could test them. + +::: {.panel-tabset} + +### Bigger by smaller + +- Using `4` and `2`, the answer should be `2`. + +```python +assert a_div_b(4, 2) == 2 +``` + +- Or... the answer should be `larger` than `1`. + +```python +assert a_div_b(8, 7) > 1 +``` + +### Smaller by bigger + +- Using `2` and `4`, the answer should be `0.5`. + +```python +assert a_div_b(2, 4) == 0.5 +``` + +- Or... the answer should be `smaller` than `1`. + +```python +assert a_div_b(7, 8) < 1 +``` + +### Negative numbers + +- Using `-4` and `-2`, the answer should be `2`. + +```python +assert a_div_b(-4, -2) == 2 +``` + +- Or... the answer should be `positive`. + +```python +assert a_div_b(-4, -2) > 0 +``` + +::: diff --git a/swd3_2024/project.qmd b/swd3_2024/project.qmd new file mode 100644 index 0000000..a9f7c6b --- /dev/null +++ b/swd3_2024/project.qmd @@ -0,0 +1,183 @@ +## Bringing it all together {.smaller} + +### The Hypotenuse Problem + +Calculating the hypotenuse + +$$ c = \sqrt{a^2 + b^2} $$ + + +General Design + +- 1 squared function +- 1 sum function +- 1 square root function +- 1 hypotenuse function that uses the other functions + + +## Workflow {.smaller} + +1. Install Git, Anaconda, VScode +2. Create a GitHub repository + Licence + .gitignore + Readme +3. Setup GH Action for testing (Python Application) +4. Clone GH repository in local machine +5. Create project structure (source and test folders) +6. Setup tests (start with `test_`) +7. Develop code +8. Add docstring (you can use `autoDocstring - Python Docstring Generator on VS Code`) +9. Lint code and tests +10. Push to github +11. EXTRA: Create Sphinx documentation +12. EXTRA: Setup file and local install +13. EXTRA: GH Release + +## Extra: Sphinx documentation {.smaller} + +- Create docstring for every function +- Install `sphinx` +- Start the basic structure using: `$ sphinx-quickstart docs` +- Use the apidoc to get docstrings: `$ sphinx-apidoc -o docs .` +- Edit files: + +::: {.panel-tabset} + +### `conf.py` + +- add extentions: `'sphinx.ext.todo', 'sphinx.ext.viewcode', 'sphinx.ext.autodoc'`. +- change theme: `sphinx_rtd_theme` +- add the `src` (change the folder name as necessary!) folder as path: + +```python + import os + import sys + sys.path.insert(0, os.path.abspath('../src')) +``` + +### `index.rst` + +Add extra files after `Contents` + +``` +.. toctree:: + :maxdepth: 2 + :caption: Contents: + + dependencies + usage + functions +``` + +### `dependencies.rst` + +List all your dependencies: + +``` +Dependencies +============ + +- python +- pytest +- flake8 +- black +- sphinx +``` + +### `usage.rst` + +Explain how to use your software + +``` +Usage Guide +============ + +To start working with this repository you need to clone it onto your local +machine: :: + + $ git clone https://github.com/... + + +Next ... +``` + +### `functions.rst` + +Create a function file with the following: + +``` +API reference +============= + +.. automodule:: calc + :members: + :undoc-members: + :show-inheritance: +``` + +::: + +## Extra: documentation Action {.smaller} + +Create a new GH action to create a nice website for your documentation. + +- The action is available [here](https://github.com/patricia-ternes/hypot-2023/blob/main/.github/workflows/documentation.yml) +- You may need update GH Actions permissions to allow `write` +- After a successful documentation action, you need to select `gh-pages` branch to activate your website + +## Extra: Setup file + +Create a `setup.py` file like: + +```python +import setuptools + +with open("README.md", "r") as fh: + long_description = fh.read() + +setuptools.setup( + name="hypot", + version="0.1.0", + author="Patricia Ternes", + author_email="p.ternesdallagnollo@leeds.ac.uk", + description="The hypot SWD3 demo package", + packages=setuptools.find_packages(), + classifiers=[ + "Programming Language :: Python :: 3.9", + "Intended Audience :: Science/Research/Learning", + ], + python_requires=">=3.9", +) +``` + +## Local Installation + +**Install:** install the hypot package into the environment using: + +```bash +$ python setup.py install +``` + +**Usage:** if you want to create a personalised script, you +can import the hypot modules as follows: + +```python +from hypot.calc import squared, addition, sqroot +``` + +**Remove:** If you want to remove your package, use pip: + +```bash +$ pip uninstall hypot +``` + +## Release + +Release in GitHub are based in tags with the following structure: + +`v0.5.2` + +| Change | Release | Example | +| ------ | -------- | ------- | +| Major | Breaking | 0 | +| Minor | Feature | 5 | +| Patch | Fix | 2 | + diff --git a/swd3_2024/sdlc.qmd b/swd3_2024/sdlc.qmd new file mode 100644 index 0000000..675d46b --- /dev/null +++ b/swd3_2024/sdlc.qmd @@ -0,0 +1,163 @@ +## Software Development Life Cycle (SDLC) + +![](./assets/img/sdlc/software-lifecicle.jpg) + +## SDLC {.smaller} + +::: {.panel-tabset} + +### Ideation + +:::: {.columns} +::::: {.column width="80%"} +:::::: {.subhead} +What are we going to do? +:::::: + +- Brainstorming +- Research +::::: + +::::: {.column width="20%"} +![](./assets/img/sdlc/ideation.jpg) +::::: +:::: + +### Requirements + +:::: {.columns} +::::: {.column width="80%"} +:::::: {.subhead} +How are we going to do it? +:::::: + +Some topics to help define requirements include: + +- final goal +- project scope (how to reach the final goal) +- what is feasible (and how) +- what is priority +- what resources are available +- deadlines +- potential risks + +:::::: {.warning} +Warning: Each person involved in the project may have a different need. +:::::: +::::: + +::::: {.column width="20%"} +![](./assets/img/sdlc/requirements.png) +::::: +:::: + +### Design + +:::: {.columns} +::::: {.column width="80%"} +:::::: {.subhead} +What is the software architecture? +:::::: + +When designing software, the object-oriented approach is a common programming paradigm. + +Object-oriented components: + +- **Classes:** A user-defined type +- **Object instances:** A particular object instantiated from a class. +- **Methods:** A function which is “built in” to a class +- **Constructor:** A special method called when instantiating a new object + +Some principles: abstraction, encapsulation, decomposition, generalisation + +:::: +::::: {.column width="20%"} +![](./assets/img/sdlc/design.png) + +See more: +[![](./assets/img/sdlc/uml.png)](https://www.visual-paradigm.com/guide/uml-unified-modeling-language/uml-class-diagram-tutorial/) +::::: +:::: + +### Development + +:::: {.columns} +::::: {.column width="80%"} +:::::: {.subhead} +Is this where the fun begins? +:::::: + +:::::: {.highlight} +Take your time +:::::: +::::: + +::::: {.column width="20%"} +![](./assets/img/sdlc/dev.png) +::::: +:::: + +Development is usually the most time consuming step in a Software Development Life Cycle. + +### Test + +:::: {.columns} +::::: {.column width="80%"} +:::::: {.subhead} +Is this software good? +:::::: + +In this step, errors and failures are identified by exposing the code to an environment similar to the end-user experience. + +There are several types of testing, some examples include: + +- **Unit testing:** are all components working? +- **Integration testing:** are all components working when fitted together? +- **Performance testing:** how does the software perform against different workloads? It is fast? Stable? +- **Functional testing:** is the software aligned with Software Requirement Specification? +::::: + +::::: {.column width="20%"} +![](./assets/img/sdlc/test.png) +::::: +:::: + +### Deployment + +:::: {.columns} +::::: {.column width="80%"} +:::::: {.subhead} +Can other people use my code? +:::::: + +You can use platforms like [GitHub](https://github.com/) to release your software. + +- The **functionality** of the software is linked to **several specifications** related to the operating system and versions of packages and other software related to the project. +- **Listing these specifications will help** others to replicate the environment in which the software was developed. +::::: + +::::: {.column width="20%"} +![](./assets/img/sdlc/deployment.png) +::::: +:::: + +### Maintenance + +:::: {.columns} +::::: {.column width="80%"} +:::::: {.subhead} +Is it over? +:::::: + +We can classify maintenance into a few categories: + +- **Corrective:** fix reported errors/failures. +- **Preventive:** regular checks and fixes. +- **Perfective:** optimize implemented features, adding new features. +- **Adaptive:** keep the software updated according to changes external to the project (new programming language version, new regulation, etc.). +::::: +::::: {.column width="20%"} +![](./assets/img/sdlc/maintenance.png) +::::: +:::: +::: diff --git a/swd3_2024/structure.qmd b/swd3_2024/structure.qmd new file mode 100644 index 0000000..cd82821 --- /dev/null +++ b/swd3_2024/structure.qmd @@ -0,0 +1,64 @@ +## Basic Structure Suggestion {.smaller} + +```{.bash} +# The most basic structure for a code project should look like: +my-model +├── README.md +├── requirements.txt +├── src <- Source code for this project +└── tests <- Test code for this project +``` + +::: {.panel-tabset} + + + +### Readme + +- Is a guide that gives users a detailed description of a project you have worked on +- It is the first file a person will see when they encounter your project, so it should be fairly brief but detailed. +- See how to write a good README file in this [`freecodecamp` post](https://www.freecodecamp.org/news/how-to-write-a-good-readme-file/). + +### Requirements + +- Text information about all the necessary additional libraries, modules, and packages. +- This can be replaced by files like: [`environment.yml`](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#creating-an-environment-from-an-environment-yml-file), [`pyproject.toml`](https://packaging.python.org/en/latest/tutorials/packaging-projects/#creating-pyproject-toml), [`setup.py`](https://www.pythonforthelab.com/blog/how-create-setup-file-your-project/). + +::: + +## Advanced Project Structure {.smaller} + +Template based on [mkrapp/cookiecutter-reproducible-science github](https://github.com/mkrapp/cookiecutter-reproducible-science) + +```bash +. +├── AUTHORS.md +├── LICENSE +├── README.md +├── bin <- Your compiled model code can be stored here (not tracked by git) +├── config <- Configuration files, e.g., for doxygen or for your model if needed +├── data +│ ├── external <- Data from third party sources. +│ ├── interim <- Intermediate data that has been transformed. +│ ├── processed <- The final, canonical data sets for modeling. +│ └── raw <- The original, immutable data dump. +├── docs <- Documentation, e.g., doxygen or scientific papers (not tracked by git) +├── notebooks <- Ipython or R notebooks +├── reports <- For a manuscript source, e.g., LaTeX, Markdown, etc., or any project reports +│   └── figures <- Figures for the manuscript or reports +├── src <- Source code for this project +│ ├── data <- scripts and programs to process data +│ ├── external <- Any external source code, e.g., pull other git projects, or external libraries +│ ├── models <- Source code for your own model +│ ├── tools <- Any helper scripts go here +│ └── visualization <- Scripts for visualisation of your results, e.g., matplotlib, ggplot2 related. +└── tests <- Test code for this project +``` \ No newline at end of file From 4792ba8d288afc93f43c52599809e00f0d5b5a19 Mon Sep 17 00:00:00 2001 From: Maeve Murphy Quinlan Date: Mon, 8 Jul 2024 16:28:28 +0000 Subject: [PATCH 2/7] Add in mermaid diagrams for git --- swd3_2024.qmd | 15 ++-- swd3_2024/new-content.qmd | 177 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 185 insertions(+), 7 deletions(-) create mode 100644 swd3_2024/new-content.qmd diff --git a/swd3_2024.qmd b/swd3_2024.qmd index 17b7ec1..5774760 100644 --- a/swd3_2024.qmd +++ b/swd3_2024.qmd @@ -5,7 +5,9 @@ format: navigation-mode: linear controls-layout: bottom-right controls: false - footer: "[Research IT Website]({{< var rc.website >}}) | [Research IT Query]({{< var rc.servicedesk >}}) | [Courses Material]({{< var rc.material >}})" + footer: "[Research IT Website]({{< var rc.website >}}) | [Research IT Query]({{< var rc.servicedesk >}}) | [Courses Material]({{< var rc.material >}}) | [Useful Links](#useful-links)" + mermaid: + theme: neutral name: Software development practices for Research code: SWD3_2024 --- @@ -13,12 +15,11 @@ code: SWD3_2024 {{< include _title.qmd >}} {{< include _team.qmd >}} + +{{< include swd3_2024/new-content.qmd >}} + + ## Useful Links - [GitHub Project Demo](https://github.com/ARCTraining/swd3-demo) -- [Alan Turing Institute - Research Software Engineering Course Material](https://alan-turing-institute.github.io/rse-course/html/index.html) - -{{< include swd3_2024/sdlc.qmd >}} -{{< include swd3_2024/structure.qmd >}} -{{< include swd3_2024/good-practices.qmd >}} -{{< include swd3_2024/project.qmd >}} +- [Alan Turing Institute - Research Software Engineering Course Material](https://alan-turing-institute.github.io/rse-course/html/index.html) \ No newline at end of file diff --git a/swd3_2024/new-content.qmd b/swd3_2024/new-content.qmd new file mode 100644 index 0000000..09fe7c7 --- /dev/null +++ b/swd3_2024/new-content.qmd @@ -0,0 +1,177 @@ +## Presentation content + +Note: [Useful Links](#useful-links) are compiled at the end of this presentation. + +## Why apply software dev principles to your coding? + +```{mermaid} +flowchart LR + subgraph lab[1. Lab analysis of samples] + direction TB + A[Primary Standards: known comp. - P1] --> + B(Samples: unknown comp.) --> + C[Primary Standards again: known comp. - P2] + end + subgraph inst[2. Instrument validation after data collection] + direction LR + D[/Do P1 and P2
match each other
within error?/]-->|Yes| F + D -->|No| E + E(Instrument drift) + F[/Do P1 and P2 match
published values
within error?/] + F -->|No| G + G(Calibration issue) + end + lab ---> inst + F -->|Yes| pos + E --> neg + G --> neg + neg(fa:fa-ban Results not valid) + pos(Results may be valid) + pos --> posnext[Test scientific
validity of results] + neg -.-> negnext[Check instrument settings
Rerun analyses] +``` + +- Without the above documented steps, my results would not be publishable or considered in any way robust +- How do we implement a similar workflow for computational research? + - We treat code as a laboratory instrument! + +# Anything worth doing, is worth doing well + + +# Anything worth doing well, is worth doing poorly at first + + +# Using git + +## + +### git workflow + +```{mermaid} +gitGraph + commit id: "First commit" + commit id: "Add README.md" +``` + +## + +### git workflow + +```{mermaid} +gitGraph + commit id: "First commit" + commit id: "Add README.md" + branch first-feature + checkout first-feature + commit id: "Adding code" +``` + +## + +### git workflow + +```{mermaid} +gitGraph + commit id: "First commit" + commit id: "Add README.md" + branch first-feature + checkout first-feature + commit id: "Adding code" + commit + commit + checkout main + merge first-feature id: "Tests pass" +``` + +## + +### git workflow + +```{mermaid} +gitGraph + commit id: "First commit" + commit id: "Add README.md" + branch first-feature + checkout first-feature + commit id: "Adding code" + commit + commit + checkout main + merge first-feature id: "Tests pass" + branch new-feature + checkout new-feature + commit +``` + +## + +### git workflow + +```{mermaid} +gitGraph + commit id: "First commit" + commit id: "Add README.md" + branch first-feature + checkout first-feature + commit id: "Adding code" + commit + commit + checkout main + merge first-feature id: "Tests pass" + branch new-feature + checkout new-feature + commit + commit id: "Tests fail!" type:REVERSE +``` + +## + +### git workflow + +```{mermaid} +gitGraph + commit id: "First commit" + commit id: "Add README.md" + branch first-feature + checkout first-feature + commit id: "Adding code" + commit + commit + checkout main + merge first-feature id: "Tests pass" + branch new-feature + checkout new-feature + commit + commit id: "Tests fail!" type:REVERSE + checkout main + branch new-feature-02 + commit + commit +``` + +## + +### git workflow + +```{mermaid} +gitGraph + commit id: "First commit" + commit id: "Add README.md" + branch first-feature + checkout first-feature + commit id: "Adding code" + commit + commit + checkout main + merge first-feature id: "Tests pass" + branch new-feature + checkout new-feature + commit + commit id: "Tests fail!" type:REVERSE + checkout main + branch new-feature-02 + commit + commit + checkout main + merge new-feature-02 id: "Tests pass still" +``` \ No newline at end of file From c2c71e65fd4845bdb078fe05335f5df913067f47 Mon Sep 17 00:00:00 2001 From: Maeve Murphy Quinlan Date: Tue, 9 Jul 2024 11:11:10 +0000 Subject: [PATCH 3/7] Adding notes to slides --- swd3_2024/new-content.qmd | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/swd3_2024/new-content.qmd b/swd3_2024/new-content.qmd index 09fe7c7..5243fd2 100644 --- a/swd3_2024/new-content.qmd +++ b/swd3_2024/new-content.qmd @@ -174,4 +174,6 @@ gitGraph commit checkout main merge new-feature-02 id: "Tests pass still" -``` \ No newline at end of file +``` + +## \ No newline at end of file From 2ca157c7d1f24df1165e61f13d3369920f7ca65e Mon Sep 17 00:00:00 2001 From: Maeve Murphy Quinlan Date: Tue, 9 Jul 2024 19:30:30 +0000 Subject: [PATCH 4/7] Update content and add a workfow dispatch --- .github/workflows/quarto-publish.yml | 1 + swd3_2024/new-content.qmd | 67 +++++++++++++++++++++++++++- 2 files changed, 66 insertions(+), 2 deletions(-) diff --git a/.github/workflows/quarto-publish.yml b/.github/workflows/quarto-publish.yml index 77c5fe7..4076925 100644 --- a/.github/workflows/quarto-publish.yml +++ b/.github/workflows/quarto-publish.yml @@ -1,6 +1,7 @@ on: push: branches: main + workflow_dispatch: name: Render and Publish diff --git a/swd3_2024/new-content.qmd b/swd3_2024/new-content.qmd index 5243fd2..38f7a23 100644 --- a/swd3_2024/new-content.qmd +++ b/swd3_2024/new-content.qmd @@ -1,6 +1,69 @@ -## Presentation content +## Software Development Skills for Research Computing + +- Learn to apply some basic doftware development skills and tools to your code +- Make your research computing more robust and reproducible +- Discover some frameworks and methods that can help you write better code +- Point you towards resources + +## Agenda + +| Start time | End time | Duration | Content | +|---|---|---|---| +| **10:00** | **10:50** | **50 min** | **Intro presentation** | +| 10:50 | 11:00 | _10 min_ | _Short break_ | +| **11:00** | **12:00** | **60 min** | **Version control and project organisation** | +| 12:00 | 13:00 | _60 min_ | _Lunch_ | +| **13:00** | **13:50** | **50 min** | **Testing and linting code** | +| 13:50 | 14:00 | _10 min_ | _Short break_ | +| **14:00** | **14:45** | **45 min** | **Documentation and automated workflows** | +| 14:45 | 15:00 | _15 min_ | _Short break_ | +| **15:00** | **15:45** | **45 min** | **Packaging and releases** | +| **15:45** | **16:00** | **15 min** | **Questions, wrap-up** | -Note: [Useful Links](#useful-links) are compiled at the end of this presentation. +## Why apply software dev principles to your coding? + +An example from my research: Electron Microprobe Analysis + +```{mermaid} +flowchart LR + subgraph lab[1. Lab analysis of samples] + direction TB + A[Primary Standards: known comp. - P1] --> + B(Samples: unknown comp.) --> + C[Primary Standards again: known comp. - P2] + end + lab -..-> END[2. Instrument validation after data collection] +``` + +- Bracket samples with standards of known composition (published and trusted standards) + + +## Why apply software dev principles to your coding? + +```{mermaid} +flowchart LR + subgraph inst[2. Instrument validation after data collection] + direction LR + D[/Do P1 and P2
match each other
within error?/]-->|Yes| F + D -->|No| E + E(Instrument drift) + F[/Do P1 and P2 match
published values
within error?/] + F -->|No| G + G(Calibration issue) + end + START[1. Lab analysis of samples] -.-> inst + F -->|Yes| pos + E --> neg + G --> neg + neg(fa:fa-ban Results not valid) + pos(Results may be valid) + pos --> posnext[Test scientific
validity of results] + neg -.-> negnext[Check instrument settings
Rerun analyses] +``` + +- Compare standards to each other to see results are consistent over time +- Compare standards to their published compositions + - Well-established allowable error ## Why apply software dev principles to your coding? From facc72af757d371b84099beeed9d92c28976ae0d Mon Sep 17 00:00:00 2001 From: Maeve Murphy Quinlan Date: Mon, 22 Jul 2024 16:57:53 +0100 Subject: [PATCH 5/7] Add in extra work on slides --- index.qmd | 2 +- swd3_2024.qmd | 4 +- swd3_2024/swd3.qmd | 365 +++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 368 insertions(+), 3 deletions(-) create mode 100644 swd3_2024/swd3.qmd diff --git a/index.qmd b/index.qmd index ee7799a..cbcf67f 100644 --- a/index.qmd +++ b/index.qmd @@ -22,7 +22,7 @@ the link will be provided. - [ ] SWD1b: Introduction to R programming - [ ] SWD2: Version Control with Git and GitHub - [x] [SWD3: Software development practices for Research](./swd3.qmd) -- [x] [SWD3 2024: Software development practices for Research using Cloud](./swd3_2024.qmd) +- [x] [SWD3 2024: Software development practices for Research with Python (2024 update)](./swd3_2024.qmd) - [ ] SWD4: Cloud computing for Research - [ ] SWD5: Scientific Python - [ ] SWD6: High performance Python diff --git a/swd3_2024.qmd b/swd3_2024.qmd index 5774760..9bf8d7c 100644 --- a/swd3_2024.qmd +++ b/swd3_2024.qmd @@ -9,14 +9,14 @@ format: mermaid: theme: neutral name: Software development practices for Research -code: SWD3_2024 +code: SWD3 2024 --- {{< include _title.qmd >}} {{< include _team.qmd >}} -{{< include swd3_2024/new-content.qmd >}} +{{< include swd3_2024/swd3.qmd >}} ## Useful Links diff --git a/swd3_2024/swd3.qmd b/swd3_2024/swd3.qmd new file mode 100644 index 0000000..98a93fb --- /dev/null +++ b/swd3_2024/swd3.qmd @@ -0,0 +1,365 @@ +## Software Development Skills for Research Computing + +During this course, you will: + +- Learn to apply basic software development practices to improve your code +- Get to grips with organising your code-base +- Develop a blueprint for dealing with dependencies, conda environments, and code versions +- Learn about various tools and resources you can implement in the future + +## Software Development Skills for Research Computing + +During this course, you will **not**: + +- Learn **best practice** software development: we are going for a *good enough* approach as opposed to perfect, but can point you to resources if you want to learn more +- Become a software developer overnight: it takes practise! +- Learn the complicated mathematics behind your numerical models or statistical analysis, or how to implement these in Python + +## Agenda + +| Start time | End time | Duration | Content | +|---|---|---|---| +| **10:00** | **10:50** | **50 min** | **Intro presentation** | +| 10:50 | 11:00 | _10 min_ | _Short break_ | +| **11:00** | **12:00** | **60 min** | **Version control and project organisation** | +| 12:00 | 13:00 | _60 min_ | _Lunch_ | +| **13:00** | **13:50** | **50 min** | **Testing and linting code** | +| 13:50 | 14:00 | _10 min_ | _Short break_ | +| **14:00** | **14:45** | **45 min** | **Documentation and automated workflows** | +| 14:45 | 15:00 | _15 min_ | _Short break_ | +| **15:00** | **15:45** | **45 min** | **Packaging and releases** | +| **15:45** | **16:00** | **15 min** | **Questions, wrap-up** | + +## Course notes + +- [Documentation and detailed notes](https://murphyqm.github.io/swd3-notes/){preview-link="true"} +- [DeReLiCT code](https://derelict.streamlit.app/) - the bare minimum to stop your code falling down +- [Basic Python Project Structure](https://package-your-python.streamlit.app/) - interactive webapp to generate code snippets to set up your project + + +## Why apply software dev principles to your coding? + +An example from my research: Electron Microprobe Analysis + +```{mermaid} +flowchart LR + subgraph lab[1. Lab analysis of samples] + direction TB + A[Primary Standards: known comp. - P1] --> + B(Samples: unknown comp.) --> + C[Primary Standards again: known comp. - P2] + end + lab -..-> END[2. Instrument validation after data collection] +``` + +- Bracket samples with standards of known composition (published and trusted standards) + +## Why apply software dev principles to your coding? + +```{mermaid} +flowchart LR + subgraph inst[2. Instrument validation after data collection] + direction LR + D[/Do P1 and P2
match each other
within error?/]-->|Yes| F + D -->|No| E + E(Instrument drift) + F[/Do P1 and P2 match
published values
within error?/] + F -->|No| G + G(Calibration issue) + end + START[1. Lab analysis of samples] -.-> inst + F -->|Yes| pos + E --> neg + G --> neg + neg(fa:fa-ban Results not valid) + pos(Results may be valid) + pos --> posnext[Test scientific
validity of results] + neg -.-> negnext[Check instrument settings
Rerun analyses] +``` + +- Compare standards to each other to see results are consistent over time +- Compare standards to their published compositions + - Well-established allowable error + +## Why apply software dev principles to your coding? + +```{mermaid} +flowchart LR + subgraph lab[1. Lab analysis of samples] + direction TB + A[Primary Standards: known comp. - P1] --> + B(Samples: unknown comp.) --> + C[Primary Standards again: known comp. - P2] + end + subgraph inst[2. Instrument validation after data collection] + direction LR + D[/Do P1 and P2
match each other
within error?/]-->|Yes| F + D -->|No| E + E(Instrument drift) + F[/Do P1 and P2 match
published values
within error?/] + F -->|No| G + G(Calibration issue) + end + lab ---> inst + F -->|Yes| pos + E --> neg + G --> neg + neg(fa:fa-ban Results not valid) + pos(Results may be valid) + pos --> posnext[Test scientific
validity of results] + neg -.-> negnext[Check instrument settings
Rerun analyses] +``` + +- Without the above documented steps, my results would not be publishable or considered in any way robust +- How do we implement a similar workflow for computational research? + - We treat code as a laboratory instrument! + +## GitHub codespaces and devcontainers + +- Today, we are going to be using GitHub codespaces to run our code +- This is essentially just a remote linux machine running in the cloud +- You get restricted free access (120 hours per month) which is plenty for this course +- When using what we've discussed for your own research, install everything locally +- We have created a template repository for you to use + +> Make sure you have a GitHub account and know your login details! + +## Version Control {.smaller} + +![[Piled Higher and Deeper by Jorge Cham](http://www.phdcomics.com)](./assets/img/git/phd101212s.png) + +## Version Control + +- Manual: naming files `v1`, `v2`, etc. +- Automated: using trackchanges on worddocs, overleaf etc. +- Automated plain text: using SVN, **git** etc. + +We are going to use [`git`](https://git-scm.com/): + +- Free, open source +- Simple, easy to learn +- Fast +- Very widely used within research community +- Lots of tools built around it + +## + +### git workflow + +```{mermaid} +gitGraph + commit id: "First commit" + commit id: "Add README.md" +``` + +## + +### git workflow + +```{mermaid} +gitGraph + commit id: "First commit" + commit id: "Add README.md" + branch first-feature + checkout first-feature + commit id: "Adding code" +``` + +## + +### git workflow + +```{mermaid} +gitGraph + commit id: "First commit" + commit id: "Add README.md" + branch first-feature + checkout first-feature + commit id: "Adding code" + commit + commit + checkout main + merge first-feature id: "Tests pass" +``` + +## + +### git workflow + +```{mermaid} +gitGraph + commit id: "First commit" + commit id: "Add README.md" + branch first-feature + checkout first-feature + commit id: "Adding code" + commit + commit + checkout main + merge first-feature id: "Tests pass" + branch new-feature + checkout new-feature + commit +``` + +## + +### git workflow + +```{mermaid} +gitGraph + commit id: "First commit" + commit id: "Add README.md" + branch first-feature + checkout first-feature + commit id: "Adding code" + commit + commit + checkout main + merge first-feature id: "Tests pass" + branch new-feature + checkout new-feature + commit + commit id: "Tests fail!" type:REVERSE +``` + +## + +### git workflow + +```{mermaid} +gitGraph + commit id: "First commit" + commit id: "Add README.md" + branch first-feature + checkout first-feature + commit id: "Adding code" + commit + commit + checkout main + merge first-feature id: "Tests pass" + branch new-feature + checkout new-feature + commit + commit id: "Tests fail!" type:REVERSE + checkout main + branch new-feature-02 + commit + commit +``` + +## + +### git workflow + +```{mermaid} +gitGraph + commit id: "First commit" + commit id: "Add README.md" + branch first-feature + checkout first-feature + commit id: "Adding code" + commit + commit + checkout main + merge first-feature id: "Tests pass" + branch new-feature + checkout new-feature + commit + commit id: "Tests fail!" type:REVERSE + checkout main + branch new-feature-02 + commit + commit + checkout main + merge new-feature-02 id: "Tests pass still" +``` + +## Project organisation + +What does your project currently look like? + +- Lots of Python scripts in different folders? +- Very long, convoluted Python files? +- Tests? +- Comments? + +How do you share your Python work? + +How do you record what version of each script you used? + +How do you transfer your work to the HPC system and back? + + +## Basic Structure Suggestion {.smaller} + +```{.bash} +# The most basic structure for a code project should look like: +my-model +├── README.md +├── requirements.txt +├── src <- Source code for this project +└── tests <- Test code for this project +``` + +::: {.panel-tabset} + + + +### Readme + +- Is a guide that gives users a detailed description of a project you have worked on +- It is the first file a person will see when they encounter your project, so it should be fairly brief but detailed. +- See how to write a good README file in this [`freecodecamp` post](https://www.freecodecamp.org/news/how-to-write-a-good-readme-file/). + +### Requirements + +- Text information about all the necessary additional libraries, modules, and packages. +- This can be replaced by files like: [`environment.yml`](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#creating-an-environment-from-an-environment-yml-file), [`pyproject.toml`](https://packaging.python.org/en/latest/tutorials/packaging-projects/#creating-pyproject-toml), [`setup.py`](https://www.pythonforthelab.com/blog/how-create-setup-file-your-project/). + +::: + +## Advanced Project Structure {.smaller} + +Template based on [mkrapp/cookiecutter-reproducible-science github](https://github.com/mkrapp/cookiecutter-reproducible-science) + +```bash +. +├── AUTHORS.md +├── LICENSE +├── README.md +├── bin <- Your compiled model code can be stored here (not tracked by git) +├── config <- Configuration files, e.g., for doxygen or for your model if needed +├── data +│ ├── external <- Data from third party sources. +│ ├── interim <- Intermediate data that has been transformed. +│ ├── processed <- The final, canonical data sets for modeling. +│ └── raw <- The original, immutable data dump. +├── docs <- Documentation, e.g., doxygen or scientific papers (not tracked by git) +├── notebooks <- Ipython or R notebooks +├── reports <- For a manuscript source, e.g., LaTeX, Markdown, etc., or any project reports +│   └── figures <- Figures for the manuscript or reports +├── src <- Source code for this project +│ ├── data <- scripts and programs to process data +│ ├── external <- Any external source code, e.g., pull other git projects, or external libraries +│ ├── models <- Source code for your own model +│ ├── tools <- Any helper scripts go here +│ └── visualization <- Scripts for visualisation of your results, e.g., matplotlib, ggplot2 related. +└── tests <- Test code for this project +``` + +## Testing code + +## Linting and Formatting code + +## Documentation + +## Releases on GitHub \ No newline at end of file From 9526b9bf20389a7a9ed70f9b8e883a6de50359a6 Mon Sep 17 00:00:00 2001 From: Maeve Murphy Quinlan Date: Tue, 23 Jul 2024 10:23:55 +0100 Subject: [PATCH 6/7] Update slide content --- swd3_2024/swd3.qmd | 229 +++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 212 insertions(+), 17 deletions(-) diff --git a/swd3_2024/swd3.qmd b/swd3_2024/swd3.qmd index 98a93fb..4a8ffaf 100644 --- a/swd3_2024/swd3.qmd +++ b/swd3_2024/swd3.qmd @@ -124,6 +124,8 @@ flowchart LR > Make sure you have a GitHub account and know your login details! +# Version Control + ## Version Control {.smaller} ![[Piled Higher and Deeper by Jorge Cham](http://www.phdcomics.com)](./assets/img/git/phd101212s.png) @@ -152,6 +154,10 @@ gitGraph commit id: "Add README.md" ``` +- Make some change to file `README.md` +- Add the file: `git add README.md` +- Commit the file with a message: `git commit -m "My note goes here"` + ## ### git workflow @@ -165,6 +171,10 @@ gitGraph commit id: "Adding code" ``` +- Create a new branch called `first-feature`: `git branch first-feature` +- Swap over to that branch: `git checkout first-feature` +- Then the usual add and commit: `git add .`, `git commit` -> without the `-m` for message, this will open a text editor for you to add a message + ## ### git workflow @@ -182,6 +192,9 @@ gitGraph merge first-feature id: "Tests pass" ``` +- After making a series of changes, we can run tests on our code +- We can merge the changes back to the main branch if we are happy + ## ### git workflow @@ -275,6 +288,31 @@ gitGraph merge new-feature-02 id: "Tests pass still" ``` +## + +## Version control + +### Essential git commands + +We will implement these later! + +```bash +git status # check on status of current git repo +git branch NAME # create a branch called NAME +git checkout NAME # swap over to the branch called NAME +git add . # stage all changed files for commit, you can replace "." with FILE to add a single file called FILE +git commit # commit the staged files (this will open your text editor to create a commit message) +git push origin NAME # push local commits to the remote branch tracking the branch NAME +``` + +## Version control + +- All your files and the git history will be stored in a public repository on GitHub +- Transparency, easy to see your process, useful for reviewing code +- Don't worry about your "messy workings" being visible - it's part of the scientific process + +# Project Organisation + ## Project organisation What does your project currently look like? @@ -295,38 +333,99 @@ How do you transfer your work to the HPC system and back? ```{.bash} # The most basic structure for a code project should look like: -my-model +my-package ├── README.md -├── requirements.txt +├── pyproject.toml ├── src <- Source code for this project └── tests <- Test code for this project ``` ::: {.panel-tabset} - +- Your python code, including an `__init__.py` file to turn it into a package ### Readme -- Is a guide that gives users a detailed description of a project you have worked on -- It is the first file a person will see when they encounter your project, so it should be fairly brief but detailed. +- This is a guide that gives users a detailed description of the contents of the repository: in this case, your Python package +- It is the first file a person will see when they encounter your project, so it should be succinct - See how to write a good README file in this [`freecodecamp` post](https://www.freecodecamp.org/news/how-to-write-a-good-readme-file/). -### Requirements +### pyproject.toml + +- Text information about all the necessary additional libraries, the structure of the project, your name etc. +- Allows you to install the code in `src/` as a Python package to use elsewhere on your system +- Find out more about the format of the [`pyproject.toml`](https://packaging.python.org/en/latest/tutorials/packaging-projects/#creating-pyproject-toml) file +- This can be replaced by/supplemented by files like: [`environment.yml`](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#creating-an-environment-from-an-environment-yml-file), [`setup.py`](https://www.pythonforthelab.com/blog/how-create-setup-file-your-project/). -- Text information about all the necessary additional libraries, modules, and packages. -- This can be replaced by files like: [`environment.yml`](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#creating-an-environment-from-an-environment-yml-file), [`pyproject.toml`](https://packaging.python.org/en/latest/tutorials/packaging-projects/#creating-pyproject-toml), [`setup.py`](https://www.pythonforthelab.com/blog/how-create-setup-file-your-project/). +### tests + +- This folder contains tests that run small sections of your code with known expected results +- All tests units (files and methods) must be named starting with `test_` and placed inside a directory called `tests`. +- Tests can be grouped in just one folder for the entire repository or they can be organized within each package/subpackage. ::: +## Two directory structure + +- I keep my large-scale code development separate from my scientific output +- For example, I want to analyse the thermal evolution of a planet for a scientific papers + - I build a numerical model of the heating and cooling of the planet + - I use this model to test a range of parameters and compare to various datasets +- I might be able to reuse my numerical model in other situations so want to keep this separate +- I know that my research process involves lots of exploratory plotting and analysis, which produces lots of scripts, and I don't want these to get mixed up with my model code + +## Repository 1: the numerical model as a Python package + +```text +planet-evolution/ The package git repository +├── src/ +│ └── planet_evolution/ +│ ├── __init__.py Makes the folder a package. +│ └── source.py An example module containing source code. +├── tests/ +| ├── __init__.py Sets up the test suite. +│ └── test_source.py A file containing tests for the code in source.py. +├── README.md README with information about the project. +├── docs Package documentation +├── pyproject.toml Allows me to install this as a package +├── LICENSE License text to allow for reuse +└── CITATION.cff Citation file that makes it easy for people to cite you! +``` + +This model can be installed as a package, cited in your research, and reused in a later project. + +## Repository 2: my scientific analysis + +```bash +pallasite-parent-body-evolution/ The project git repository +├── LICENSE +├── README.md +├── env.yml or requirements.txt The libraries I need for analysis (including planet_evolution!) +├── data I usually load in large data from storage elsewhere +│ ├── interim But sometimes do keep small summary datafiles in the repository +│ ├── processed +│ └── raw +├── docs Notes on analysis, process etc. +├── notebooks Jupyter notebooks used for analysis +├── reports For a manuscript source, e.g., LaTeX, Markdown, etc., or any project reports +│   └── figures Figures for the manuscript or reports +├── src Source code for this project +│ ├── data Scripts and programs to process data +│ ├── tools Any helper scripts go here +│ └── visualization Scripts for visualisation of your results, e.g., matplotlib, ggplot2 related. +└── tests Test code for this project, benchmarking, comparison to analytical models +``` + +::: {style="font-size: 50%;"} + +This is the actual work for the scientific project - while others are unlikely to use this code as-is, it's public and citeable so that you can point to a specific version in your published paper and readers can reproduce your work with it if they wish. + +Adapted/modified from [mkrapp/cookiecutter-reproducible-science github](https://github.com/mkrapp/cookiecutter-reproducible-science) + +::: + ## Advanced Project Structure {.smaller} Template based on [mkrapp/cookiecutter-reproducible-science github](https://github.com/mkrapp/cookiecutter-reproducible-science) @@ -356,10 +455,106 @@ Template based on [mkrapp/cookiecutter-reproducible-science github](https://gith └── tests <- Test code for this project ``` +# Testing code + +## Testing code + +Remember our example of using known standards to check the instruments in the lab? + +This is the equivalent for computational work! + +- Tests ensure that your code runs in the way it's intended to +- Tests will flag if any changes you made either + - Produce an error or break the code + - "Silently" introduce errors - the code still runs, but the output is different + + ## Testing code -## Linting and Formatting code +::: {style="font-size: 50%;"} + +> The good news is, you’ve probably already created a test without realizing it. Remember when you ran your application and used it for the first time? Did you check the features and experiment using them? That’s known as exploratory testing and is a form of manual testing. +> +> Exploratory testing is a form of testing that is done without a plan. In an exploratory test, you’re just exploring the application. +> +> To have a complete set of manual tests, all you need to do is make a list of all the features your application has, the different types of input it can accept, and the expected results. Now, every time you make a change to your code, you need to go through every single item on that list and check it. +> +> That doesn’t sound like much fun, does it? + +From [RealPython: Getting Started With Testing in Python](https://realpython.com/python-testing/) +::: + +Lots and lots of great accessible resources for learning about this and implementing this: + +- [Hitchiker's Guide to Python: Testing your code](https://docs.python-guide.org/writing/tests/) +- [A gentle introduction to Unit Testing in Python](https://machinelearningmastery.com/a-gentle-introduction-to-unit-testing-in-python/) +- [RealPython: Getting Started With Testing in Python](https://realpython.com/python-testing/) + + +## Testing code + + +Python tests generally rely on `assert` statements or similar, where the test passes `if`: + +```python +package_function_output == expected_example_output +``` + +::: {style="font-size: 50%;"} +N.B. rarely in scientific applications can we use `==` as we are often dealing with floats and some degree of error; we will discuss the various alternatives that allow tolerances during the testing session. +::: + +::: {.panel-tabset} +### Unit Tests + +- Tests each individual little piece of code +- Each function in your project should have a unit test +- Tests edgecases + +### Integration Tests + +- Tests how the package works together as a whole +- Tests various combinations of functions +- Tests the full workflow you use for research + +::: + +## Testing your science + +This is often where tutorials stop when it comes to code testing: but you also need to scientifically validate your code! + +- Unit Testing and Integration Testing test that the code is functional, it does not check for scientific validity +- Depending on your area/the code you are writing, you might need to test for: + - Numerical precision and accuracy + - Stability + - Agreement with previous numerical models/analytical solutions* + - Scientific sense: does the answer make physical sense? (Does the thing cool down when you expect it to? Does time run forwards?) + +::: {style="font-size: 50%;"} +There are other problems with the circle of purely validating numerical models against other numerical models... but that is too long a debate for today! +::: + +## Testing your science + +- In our two-repository set up, tests for science can be split across both: + - Some tests will always need to be true (the model should *always* be cooling, gravitational acceleration should *always* be >20 ms^-2 on a giant planet etc.) + - Some tests will be specific to your application for a scientific output and can live in that second repository + +::: {style="font-size: 50%;"} +* There are other problems with the circle of purely validating numerical models against other numerical models... but that is too long a debate for today! +::: + +# Linting and Formatting code + +- This is the equivalent of spellchecker for your code +- Do yourself a favour and ensure whatever IDE you are using has this enabled! + +You can also format the code after the fact: + +- [Black formatter](https://black.vercel.app/){preview-link="true"} + +# Documentation -## Documentation +**Comment your code!** -## Releases on GitHub \ No newline at end of file +# Releases on GitHub \ No newline at end of file From 9833c7d2e9aa4eab93a18edd1ad59e52dfc06b36 Mon Sep 17 00:00:00 2001 From: Maeve Murphy Quinlan Date: Wed, 24 Jul 2024 09:09:23 +0100 Subject: [PATCH 7/7] Adding new content for swd32024 --- swd3_2024/good-practices.qmd | 241 ---------------------------------- swd3_2024/new-content.qmd | 242 ----------------------------------- swd3_2024/project.qmd | 183 -------------------------- swd3_2024/sdlc.qmd | 163 ----------------------- swd3_2024/structure.qmd | 64 --------- swd3_2024/swd3.qmd | 204 ++++++++++++++++++++++++++++- 6 files changed, 199 insertions(+), 898 deletions(-) delete mode 100644 swd3_2024/good-practices.qmd delete mode 100644 swd3_2024/new-content.qmd delete mode 100644 swd3_2024/project.qmd delete mode 100644 swd3_2024/sdlc.qmd delete mode 100644 swd3_2024/structure.qmd diff --git a/swd3_2024/good-practices.qmd b/swd3_2024/good-practices.qmd deleted file mode 100644 index 04c906d..0000000 --- a/swd3_2024/good-practices.qmd +++ /dev/null @@ -1,241 +0,0 @@ -## Virtual Environments {.smaller} - -If application A needs version 1.0 of a particular module but application B -needs version 2.0, then the requirements are in conflict and installing either -version 1.0 or 2.0 will leave one application unable to run. - -The solution for this problem is to create a virtual environment, a -self-contained directory tree that contains installation for particular versions -of software/packages. - -### Conda - -- [Conda](https://docs.conda.io/en/latest/) is an open source package management -system and environment management system that runs on Windows, macOS, and Linux. -- It offers dependency and environment management for any language—Python, R, -Ruby, Lua, Scala, Java, JavaScript, C/ C++, Fortran, and more. -- Easy user install via [Anaconda](https://www.anaconda.com/download). - - -## Code formatting - -```python -# myscript.py: -x = { 'a':37,'b':42, -'c':927} -y = 'hello '+ 'world' -class foo ( object ): - def f (self ): - return y **2 - def g(self, x :int, - y : int=42 - ) -> int: - return x--y -def f ( a ) : - return 37+-a[42-a : y*3] -``` - -## Coding conventions {.smaller} - -If your language or project has a standard policy, use that. For example: - -- Python: [PEP8](https://www.python.org/dev/peps/pep-0008/) -- R: [Google's guide for R](https://google.github.io/styleguide/Rguide.xml), [tidyverse style guide](https://style.tidyverse.org/) -- C++: [Google's style guide](https://google.github.io/styleguide/cppguide.html) -- Julia: [Official style guide](https://docs.julialang.org/en/v1/manual/style-guide/index.html) - -## Linters - -Linters are automated tools which enforce coding conventions and check for -common mistakes. For example: - -- Python: - - [flake8](https://flake8.pycqa.org/en/latest/index.html) (flags any syntax/style errors) - - [black](https://black.readthedocs.io/) (enforces the style) - - [isort](https://pycqa.github.io/isort/) ("Sorts" imports alphabetically in groups) - -## Example: Flake8 Linter - -```bash -$ conda install flake8 -$ flake8 myscript.py -myscript.py:2:6: E201 whitespace after '{' -myscript.py:2:11: E231 missing whitespace after ':' -myscript.py:2:14: E231 missing whitespace after ',' -myscript.py:2:18: E231 missing whitespace after ':' -myscript.py:3:1: E128 continuation line under-indented for visual indent -myscript.py:3:4: E231 missing whitespace after ':' -myscript.py:4:13: E225 missing whitespace around operator -myscript.py:4:14: E222 multiple spaces after operator -myscript.py:5:1: E302 expected 2 blank lines, found 0 -myscript.py:5:13: E201 whitespace after '(' -myscript.py:5:25: E202 whitespace before ')' -myscript.py:6:4: E111 indentation is not a multiple of 4 -myscript.py:6:9: E211 whitespace before '(' -myscript.py:6:20: E202 whitespace before ')' -myscript.py:7:8: E111 indentation is not a multiple of 4 -myscript.py:7:14: E271 multiple spaces after keyword -myscript.py:7:25: E225 missing whitespace around operator -myscript.py:8:4: E301 expected 1 blank line, found 0 -myscript.py:8:4: E111 indentation is not a multiple of 4 -myscript.py:8:17: E203 whitespace before ':' -myscript.py:8:18: E231 missing whitespace after ':' -myscript.py:9:8: E128 continuation line under-indented for visual indent -myscript.py:9:9: E203 whitespace before ':' -myscript.py:9:15: E252 missing whitespace around parameter equals -myscript.py:9:16: E252 missing whitespace around parameter equals -myscript.py:10:8: E124 closing bracket does not match visual indentation -myscript.py:10:8: E125 continuation line with same indent as next logical line -myscript.py:11:8: E111 indentation is not a multiple of 4 -myscript.py:12:1: E302 expected 2 blank lines, found 0 -myscript.py:12:6: E211 whitespace before '(' -myscript.py:12:9: E201 whitespace after '(' -myscript.py:12:13: E202 whitespace before ')' -myscript.py:12:15: E203 whitespace before ':' -myscript.py:13:4: E111 indentation is not a multiple of 4 -myscript.py:13:10: E271 multiple spaces after keyword -myscript.py:13:26: E203 whitespace before ':' -myscript.py:13:34: W291 trailing whitespace -``` - -## Example: Black Code Formatter {.smaller} - -:::{.par_botton} -Install and run Black -::: -```bash -$ conda install black -$ black myscript.py -``` - -:::{.par_botton} -Check the file! -::: -```python -# myscript.py: -x = {"a": 37, "b": 42, "c": 927} -y = "hello " + "world" - - -class foo(object): - def f(self): - return y**2 - - def g(self, x: int, y: int = 42) -> int: - return x - -y - - -def f(a): - return 37 + -a[42 - a : y * 3] -``` - -## IDE {.smaller} - -Using an Integrated development environment (IDE) will certainly save you time, but the advantages of using an IDE go beyond that. Below are some IDE advantages - -1. Syntax highlighting -2. Text autocompletion -3. Refactoring options -4. Easily Importing libraries -5. Build, compile, or run - -### Visual Studio Code - -To install VS Code follow the instructions [here](https://code.visualstudio.com/). - -## VSC Example: automatically using black {.smaller} - -**Configure VSC to use Black**: Code (or File) > Preferences > Settings - -- Search for `python formatting provider` and choose `black` -- Search for `format on save` and check the box to enable - -**Select interpreter**: View > `Command Palette..` (or `Ctrl+Shift+P`) - -- Search for `Python: Select Interpreter` -- Choose the correct environment - -Now the Black package is going to fix your codes layout every time you save a -code file. - -## Version Control {.smaller} - -![[Piled Higher and Deeper by Jorge Cham](http://www.phdcomics.com)](./assets/img/git/phd101212s.png) - -## Test-driven development {.smaller} - -**Example**, suppose we need to find the result of a number divided by another number: - -::: {.panel-tabset} - -### Naive solution - -- Write a function a_div_b. -- Call it interactively on two or three different inputs. -- If it produces the wrong answer, fix the function and re-run that test. - -This clearly works — after all, thousands of scientists are doing it right now — but there’s a better way - -### TDD solution - -- Write a short function for each test. -- Write a `a_div_b` function that should pass those tests. -- If `a_div_b` produces any wrong answers, fix it and re-run the test functions. - -Writing the tests before writing the function they exercise is called **test-driven development (TDD)**. -Its advocates believe it produces better code faster because: - -- If people write tests after writing the thing to be tested, they are subject to confirmation bias, i.e., they subconsciously write tests to show that their code is correct, rather than to find errors. -- Writing tests helps programmers figure out what the function is actually supposed to do. - -::: - -## Possible tests: `a_div_b` example {.smaller} - -Let's think in all possible scenarios for this problem and how we could test them. - -::: {.panel-tabset} - -### Bigger by smaller - -- Using `4` and `2`, the answer should be `2`. - -```python -assert a_div_b(4, 2) == 2 -``` - -- Or... the answer should be `larger` than `1`. - -```python -assert a_div_b(8, 7) > 1 -``` - -### Smaller by bigger - -- Using `2` and `4`, the answer should be `0.5`. - -```python -assert a_div_b(2, 4) == 0.5 -``` - -- Or... the answer should be `smaller` than `1`. - -```python -assert a_div_b(7, 8) < 1 -``` - -### Negative numbers - -- Using `-4` and `-2`, the answer should be `2`. - -```python -assert a_div_b(-4, -2) == 2 -``` - -- Or... the answer should be `positive`. - -```python -assert a_div_b(-4, -2) > 0 -``` - -::: diff --git a/swd3_2024/new-content.qmd b/swd3_2024/new-content.qmd deleted file mode 100644 index 38f7a23..0000000 --- a/swd3_2024/new-content.qmd +++ /dev/null @@ -1,242 +0,0 @@ -## Software Development Skills for Research Computing - -- Learn to apply some basic doftware development skills and tools to your code -- Make your research computing more robust and reproducible -- Discover some frameworks and methods that can help you write better code -- Point you towards resources - -## Agenda - -| Start time | End time | Duration | Content | -|---|---|---|---| -| **10:00** | **10:50** | **50 min** | **Intro presentation** | -| 10:50 | 11:00 | _10 min_ | _Short break_ | -| **11:00** | **12:00** | **60 min** | **Version control and project organisation** | -| 12:00 | 13:00 | _60 min_ | _Lunch_ | -| **13:00** | **13:50** | **50 min** | **Testing and linting code** | -| 13:50 | 14:00 | _10 min_ | _Short break_ | -| **14:00** | **14:45** | **45 min** | **Documentation and automated workflows** | -| 14:45 | 15:00 | _15 min_ | _Short break_ | -| **15:00** | **15:45** | **45 min** | **Packaging and releases** | -| **15:45** | **16:00** | **15 min** | **Questions, wrap-up** | - -## Why apply software dev principles to your coding? - -An example from my research: Electron Microprobe Analysis - -```{mermaid} -flowchart LR - subgraph lab[1. Lab analysis of samples] - direction TB - A[Primary Standards: known comp. - P1] --> - B(Samples: unknown comp.) --> - C[Primary Standards again: known comp. - P2] - end - lab -..-> END[2. Instrument validation after data collection] -``` - -- Bracket samples with standards of known composition (published and trusted standards) - - -## Why apply software dev principles to your coding? - -```{mermaid} -flowchart LR - subgraph inst[2. Instrument validation after data collection] - direction LR - D[/Do P1 and P2
match each other
within error?/]-->|Yes| F - D -->|No| E - E(Instrument drift) - F[/Do P1 and P2 match
published values
within error?/] - F -->|No| G - G(Calibration issue) - end - START[1. Lab analysis of samples] -.-> inst - F -->|Yes| pos - E --> neg - G --> neg - neg(fa:fa-ban Results not valid) - pos(Results may be valid) - pos --> posnext[Test scientific
validity of results] - neg -.-> negnext[Check instrument settings
Rerun analyses] -``` - -- Compare standards to each other to see results are consistent over time -- Compare standards to their published compositions - - Well-established allowable error - -## Why apply software dev principles to your coding? - -```{mermaid} -flowchart LR - subgraph lab[1. Lab analysis of samples] - direction TB - A[Primary Standards: known comp. - P1] --> - B(Samples: unknown comp.) --> - C[Primary Standards again: known comp. - P2] - end - subgraph inst[2. Instrument validation after data collection] - direction LR - D[/Do P1 and P2
match each other
within error?/]-->|Yes| F - D -->|No| E - E(Instrument drift) - F[/Do P1 and P2 match
published values
within error?/] - F -->|No| G - G(Calibration issue) - end - lab ---> inst - F -->|Yes| pos - E --> neg - G --> neg - neg(fa:fa-ban Results not valid) - pos(Results may be valid) - pos --> posnext[Test scientific
validity of results] - neg -.-> negnext[Check instrument settings
Rerun analyses] -``` - -- Without the above documented steps, my results would not be publishable or considered in any way robust -- How do we implement a similar workflow for computational research? - - We treat code as a laboratory instrument! - -# Anything worth doing, is worth doing well - - -# Anything worth doing well, is worth doing poorly at first - - -# Using git - -## - -### git workflow - -```{mermaid} -gitGraph - commit id: "First commit" - commit id: "Add README.md" -``` - -## - -### git workflow - -```{mermaid} -gitGraph - commit id: "First commit" - commit id: "Add README.md" - branch first-feature - checkout first-feature - commit id: "Adding code" -``` - -## - -### git workflow - -```{mermaid} -gitGraph - commit id: "First commit" - commit id: "Add README.md" - branch first-feature - checkout first-feature - commit id: "Adding code" - commit - commit - checkout main - merge first-feature id: "Tests pass" -``` - -## - -### git workflow - -```{mermaid} -gitGraph - commit id: "First commit" - commit id: "Add README.md" - branch first-feature - checkout first-feature - commit id: "Adding code" - commit - commit - checkout main - merge first-feature id: "Tests pass" - branch new-feature - checkout new-feature - commit -``` - -## - -### git workflow - -```{mermaid} -gitGraph - commit id: "First commit" - commit id: "Add README.md" - branch first-feature - checkout first-feature - commit id: "Adding code" - commit - commit - checkout main - merge first-feature id: "Tests pass" - branch new-feature - checkout new-feature - commit - commit id: "Tests fail!" type:REVERSE -``` - -## - -### git workflow - -```{mermaid} -gitGraph - commit id: "First commit" - commit id: "Add README.md" - branch first-feature - checkout first-feature - commit id: "Adding code" - commit - commit - checkout main - merge first-feature id: "Tests pass" - branch new-feature - checkout new-feature - commit - commit id: "Tests fail!" type:REVERSE - checkout main - branch new-feature-02 - commit - commit -``` - -## - -### git workflow - -```{mermaid} -gitGraph - commit id: "First commit" - commit id: "Add README.md" - branch first-feature - checkout first-feature - commit id: "Adding code" - commit - commit - checkout main - merge first-feature id: "Tests pass" - branch new-feature - checkout new-feature - commit - commit id: "Tests fail!" type:REVERSE - checkout main - branch new-feature-02 - commit - commit - checkout main - merge new-feature-02 id: "Tests pass still" -``` - -## \ No newline at end of file diff --git a/swd3_2024/project.qmd b/swd3_2024/project.qmd deleted file mode 100644 index a9f7c6b..0000000 --- a/swd3_2024/project.qmd +++ /dev/null @@ -1,183 +0,0 @@ -## Bringing it all together {.smaller} - -### The Hypotenuse Problem - -Calculating the hypotenuse - -$$ c = \sqrt{a^2 + b^2} $$ - - -General Design - -- 1 squared function -- 1 sum function -- 1 square root function -- 1 hypotenuse function that uses the other functions - - -## Workflow {.smaller} - -1. Install Git, Anaconda, VScode -2. Create a GitHub repository + Licence + .gitignore + Readme -3. Setup GH Action for testing (Python Application) -4. Clone GH repository in local machine -5. Create project structure (source and test folders) -6. Setup tests (start with `test_`) -7. Develop code -8. Add docstring (you can use `autoDocstring - Python Docstring Generator on VS Code`) -9. Lint code and tests -10. Push to github -11. EXTRA: Create Sphinx documentation -12. EXTRA: Setup file and local install -13. EXTRA: GH Release - -## Extra: Sphinx documentation {.smaller} - -- Create docstring for every function -- Install `sphinx` -- Start the basic structure using: `$ sphinx-quickstart docs` -- Use the apidoc to get docstrings: `$ sphinx-apidoc -o docs .` -- Edit files: - -::: {.panel-tabset} - -### `conf.py` - -- add extentions: `'sphinx.ext.todo', 'sphinx.ext.viewcode', 'sphinx.ext.autodoc'`. -- change theme: `sphinx_rtd_theme` -- add the `src` (change the folder name as necessary!) folder as path: - -```python - import os - import sys - sys.path.insert(0, os.path.abspath('../src')) -``` - -### `index.rst` - -Add extra files after `Contents` - -``` -.. toctree:: - :maxdepth: 2 - :caption: Contents: - - dependencies - usage - functions -``` - -### `dependencies.rst` - -List all your dependencies: - -``` -Dependencies -============ - -- python -- pytest -- flake8 -- black -- sphinx -``` - -### `usage.rst` - -Explain how to use your software - -``` -Usage Guide -============ - -To start working with this repository you need to clone it onto your local -machine: :: - - $ git clone https://github.com/... - - -Next ... -``` - -### `functions.rst` - -Create a function file with the following: - -``` -API reference -============= - -.. automodule:: calc - :members: - :undoc-members: - :show-inheritance: -``` - -::: - -## Extra: documentation Action {.smaller} - -Create a new GH action to create a nice website for your documentation. - -- The action is available [here](https://github.com/patricia-ternes/hypot-2023/blob/main/.github/workflows/documentation.yml) -- You may need update GH Actions permissions to allow `write` -- After a successful documentation action, you need to select `gh-pages` branch to activate your website - -## Extra: Setup file - -Create a `setup.py` file like: - -```python -import setuptools - -with open("README.md", "r") as fh: - long_description = fh.read() - -setuptools.setup( - name="hypot", - version="0.1.0", - author="Patricia Ternes", - author_email="p.ternesdallagnollo@leeds.ac.uk", - description="The hypot SWD3 demo package", - packages=setuptools.find_packages(), - classifiers=[ - "Programming Language :: Python :: 3.9", - "Intended Audience :: Science/Research/Learning", - ], - python_requires=">=3.9", -) -``` - -## Local Installation - -**Install:** install the hypot package into the environment using: - -```bash -$ python setup.py install -``` - -**Usage:** if you want to create a personalised script, you -can import the hypot modules as follows: - -```python -from hypot.calc import squared, addition, sqroot -``` - -**Remove:** If you want to remove your package, use pip: - -```bash -$ pip uninstall hypot -``` - -## Release - -Release in GitHub are based in tags with the following structure: - -`v0.5.2` - -| Change | Release | Example | -| ------ | -------- | ------- | -| Major | Breaking | 0 | -| Minor | Feature | 5 | -| Patch | Fix | 2 | - diff --git a/swd3_2024/sdlc.qmd b/swd3_2024/sdlc.qmd deleted file mode 100644 index 675d46b..0000000 --- a/swd3_2024/sdlc.qmd +++ /dev/null @@ -1,163 +0,0 @@ -## Software Development Life Cycle (SDLC) - -![](./assets/img/sdlc/software-lifecicle.jpg) - -## SDLC {.smaller} - -::: {.panel-tabset} - -### Ideation - -:::: {.columns} -::::: {.column width="80%"} -:::::: {.subhead} -What are we going to do? -:::::: - -- Brainstorming -- Research -::::: - -::::: {.column width="20%"} -![](./assets/img/sdlc/ideation.jpg) -::::: -:::: - -### Requirements - -:::: {.columns} -::::: {.column width="80%"} -:::::: {.subhead} -How are we going to do it? -:::::: - -Some topics to help define requirements include: - -- final goal -- project scope (how to reach the final goal) -- what is feasible (and how) -- what is priority -- what resources are available -- deadlines -- potential risks - -:::::: {.warning} -Warning: Each person involved in the project may have a different need. -:::::: -::::: - -::::: {.column width="20%"} -![](./assets/img/sdlc/requirements.png) -::::: -:::: - -### Design - -:::: {.columns} -::::: {.column width="80%"} -:::::: {.subhead} -What is the software architecture? -:::::: - -When designing software, the object-oriented approach is a common programming paradigm. - -Object-oriented components: - -- **Classes:** A user-defined type -- **Object instances:** A particular object instantiated from a class. -- **Methods:** A function which is “built in” to a class -- **Constructor:** A special method called when instantiating a new object - -Some principles: abstraction, encapsulation, decomposition, generalisation - -:::: -::::: {.column width="20%"} -![](./assets/img/sdlc/design.png) - -See more: -[![](./assets/img/sdlc/uml.png)](https://www.visual-paradigm.com/guide/uml-unified-modeling-language/uml-class-diagram-tutorial/) -::::: -:::: - -### Development - -:::: {.columns} -::::: {.column width="80%"} -:::::: {.subhead} -Is this where the fun begins? -:::::: - -:::::: {.highlight} -Take your time -:::::: -::::: - -::::: {.column width="20%"} -![](./assets/img/sdlc/dev.png) -::::: -:::: - -Development is usually the most time consuming step in a Software Development Life Cycle. - -### Test - -:::: {.columns} -::::: {.column width="80%"} -:::::: {.subhead} -Is this software good? -:::::: - -In this step, errors and failures are identified by exposing the code to an environment similar to the end-user experience. - -There are several types of testing, some examples include: - -- **Unit testing:** are all components working? -- **Integration testing:** are all components working when fitted together? -- **Performance testing:** how does the software perform against different workloads? It is fast? Stable? -- **Functional testing:** is the software aligned with Software Requirement Specification? -::::: - -::::: {.column width="20%"} -![](./assets/img/sdlc/test.png) -::::: -:::: - -### Deployment - -:::: {.columns} -::::: {.column width="80%"} -:::::: {.subhead} -Can other people use my code? -:::::: - -You can use platforms like [GitHub](https://github.com/) to release your software. - -- The **functionality** of the software is linked to **several specifications** related to the operating system and versions of packages and other software related to the project. -- **Listing these specifications will help** others to replicate the environment in which the software was developed. -::::: - -::::: {.column width="20%"} -![](./assets/img/sdlc/deployment.png) -::::: -:::: - -### Maintenance - -:::: {.columns} -::::: {.column width="80%"} -:::::: {.subhead} -Is it over? -:::::: - -We can classify maintenance into a few categories: - -- **Corrective:** fix reported errors/failures. -- **Preventive:** regular checks and fixes. -- **Perfective:** optimize implemented features, adding new features. -- **Adaptive:** keep the software updated according to changes external to the project (new programming language version, new regulation, etc.). -::::: -::::: {.column width="20%"} -![](./assets/img/sdlc/maintenance.png) -::::: -:::: -::: diff --git a/swd3_2024/structure.qmd b/swd3_2024/structure.qmd deleted file mode 100644 index cd82821..0000000 --- a/swd3_2024/structure.qmd +++ /dev/null @@ -1,64 +0,0 @@ -## Basic Structure Suggestion {.smaller} - -```{.bash} -# The most basic structure for a code project should look like: -my-model -├── README.md -├── requirements.txt -├── src <- Source code for this project -└── tests <- Test code for this project -``` - -::: {.panel-tabset} - - - -### Readme - -- Is a guide that gives users a detailed description of a project you have worked on -- It is the first file a person will see when they encounter your project, so it should be fairly brief but detailed. -- See how to write a good README file in this [`freecodecamp` post](https://www.freecodecamp.org/news/how-to-write-a-good-readme-file/). - -### Requirements - -- Text information about all the necessary additional libraries, modules, and packages. -- This can be replaced by files like: [`environment.yml`](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#creating-an-environment-from-an-environment-yml-file), [`pyproject.toml`](https://packaging.python.org/en/latest/tutorials/packaging-projects/#creating-pyproject-toml), [`setup.py`](https://www.pythonforthelab.com/blog/how-create-setup-file-your-project/). - -::: - -## Advanced Project Structure {.smaller} - -Template based on [mkrapp/cookiecutter-reproducible-science github](https://github.com/mkrapp/cookiecutter-reproducible-science) - -```bash -. -├── AUTHORS.md -├── LICENSE -├── README.md -├── bin <- Your compiled model code can be stored here (not tracked by git) -├── config <- Configuration files, e.g., for doxygen or for your model if needed -├── data -│ ├── external <- Data from third party sources. -│ ├── interim <- Intermediate data that has been transformed. -│ ├── processed <- The final, canonical data sets for modeling. -│ └── raw <- The original, immutable data dump. -├── docs <- Documentation, e.g., doxygen or scientific papers (not tracked by git) -├── notebooks <- Ipython or R notebooks -├── reports <- For a manuscript source, e.g., LaTeX, Markdown, etc., or any project reports -│   └── figures <- Figures for the manuscript or reports -├── src <- Source code for this project -│ ├── data <- scripts and programs to process data -│ ├── external <- Any external source code, e.g., pull other git projects, or external libraries -│ ├── models <- Source code for your own model -│ ├── tools <- Any helper scripts go here -│ └── visualization <- Scripts for visualisation of your results, e.g., matplotlib, ggplot2 related. -└── tests <- Test code for this project -``` \ No newline at end of file diff --git a/swd3_2024/swd3.qmd b/swd3_2024/swd3.qmd index 4a8ffaf..0a1bc6b 100644 --- a/swd3_2024/swd3.qmd +++ b/swd3_2024/swd3.qmd @@ -288,8 +288,6 @@ gitGraph merge new-feature-02 id: "Tests pass still" ``` -## - ## Version control ### Essential git commands @@ -546,15 +544,211 @@ There are other problems with the circle of purely validating numerical models a # Linting and Formatting code +## Coding conventions {.smaller} + +If your language or project has a standard policy, use that. For example: + +- Python: [PEP8](https://www.python.org/dev/peps/pep-0008/) +- R: [Google's guide for R](https://google.github.io/styleguide/Rguide.xml), [tidyverse style guide](https://style.tidyverse.org/) +- C++: [Google's style guide](https://google.github.io/styleguide/cppguide.html) +- Julia: [Official style guide](https://docs.julialang.org/en/v1/manual/style-guide/index.html) + +## Linters + +Linters are automated tools which enforce coding conventions and check for +common mistakes. For example: + +- Python: + - [flake8](https://flake8.pycqa.org/en/latest/index.html) (flags any syntax/style errors) + - [black](https://black.readthedocs.io/) (enforces the style) + - [isort](https://pycqa.github.io/isort/) ("Sorts" imports alphabetically in groups) + +## Example: Flake8 Linter + +```bash +$ conda install flake8 +$ flake8 myscript.py +myscript.py:2:6: E201 whitespace after '{' +myscript.py:2:11: E231 missing whitespace after ':' +myscript.py:2:14: E231 missing whitespace after ',' +myscript.py:2:18: E231 missing whitespace after ':' +myscript.py:3:1: E128 continuation line under-indented for visual indent +myscript.py:3:4: E231 missing whitespace after ':' +myscript.py:4:13: E225 missing whitespace around operator +myscript.py:4:14: E222 multiple spaces after operator +myscript.py:5:1: E302 expected 2 blank lines, found 0 +myscript.py:5:13: E201 whitespace after '(' +myscript.py:5:25: E202 whitespace before ')' +myscript.py:6:4: E111 indentation is not a multiple of 4 +myscript.py:6:9: E211 whitespace before '(' +myscript.py:6:20: E202 whitespace before ')' +myscript.py:7:8: E111 indentation is not a multiple of 4 +myscript.py:7:14: E271 multiple spaces after keyword +myscript.py:7:25: E225 missing whitespace around operator +myscript.py:8:4: E301 expected 1 blank line, found 0 +myscript.py:8:4: E111 indentation is not a multiple of 4 +myscript.py:8:17: E203 whitespace before ':' +myscript.py:8:18: E231 missing whitespace after ':' +myscript.py:9:8: E128 continuation line under-indented for visual indent +myscript.py:9:9: E203 whitespace before ':' +myscript.py:9:15: E252 missing whitespace around parameter equals +myscript.py:9:16: E252 missing whitespace around parameter equals +myscript.py:10:8: E124 closing bracket does not match visual indentation +myscript.py:10:8: E125 continuation line with same indent as next logical line +myscript.py:11:8: E111 indentation is not a multiple of 4 +myscript.py:12:1: E302 expected 2 blank lines, found 0 +myscript.py:12:6: E211 whitespace before '(' +myscript.py:12:9: E201 whitespace after '(' +myscript.py:12:13: E202 whitespace before ')' +myscript.py:12:15: E203 whitespace before ':' +myscript.py:13:4: E111 indentation is not a multiple of 4 +myscript.py:13:10: E271 multiple spaces after keyword +myscript.py:13:26: E203 whitespace before ':' +myscript.py:13:34: W291 trailing whitespace +``` + +## Linters and Formatters + - This is the equivalent of spellchecker for your code - Do yourself a favour and ensure whatever IDE you are using has this enabled! +- I prefer having the linter run while you code, rather than running after, but this is personal preference +- Many different tools available, we will have some preloaded in our devcontainer -You can also format the code after the fact: +You can see what the Black code formatter will do to your code here: - [Black formatter](https://black.vercel.app/){preview-link="true"} +# Dependencies and Virtual Environments + +## Virtual Environments {.smaller} + +If application A needs version 1.0 of a particular module but application B +needs version 2.0, then the requirements are in conflict and installing either +version 1.0 or 2.0 will leave one application unable to run. + +The solution for this problem is to create a virtual environment, a +self-contained directory tree that contains installation for particular versions +of software/packages. + +### Conda + +- [Conda](https://docs.conda.io/en/latest/) is an open source package management +system and environment management system that runs on Windows, macOS, and Linux. +- It offers dependency and environment management for any language—Python, R, +Ruby, Lua, Scala, Java, JavaScript, C/ C++, Fortran, and more. +- Easy user install via [Anaconda](https://www.anaconda.com/download). +- We will be using the minimal [MiniForge installation](https://github.com/conda-forge/miniforge/blob/main/README.md) in our devcontainer + # Documentation -**Comment your code!** +## Commenting your code + +- The most basic version of documentation is ensuring that your code is well-commented +- This helps make sure you know what's happening in your code + +```python +# Comments should be short, sweet, and to the point +``` + +```python +constant = 1.5 # Comments can be inline too +``` + +- Comments should add additional context, can contain links +- Don't add comments for the sake of commenting + +## Commenting your code + +- Comments to yourself can also help you to outline and plan your code +- You can write pseudocode in comments to help plan functions + +See this example from [RealPython](https://realpython.com/python-comments-guide/): + +```python +from collections import defaultdict + +def get_top_cities(prices): + top_cities = defaultdict(int) + + # For each price range + # Get city searches in that price + # Count num times city was searched + # Take top 3 cities & add to dict + + return dict(top_cities) +``` + +## Commenting for others + +In later iterations of your code you might want to clean up your comments to yourself and formalise your documentation more -# Releases on GitHub \ No newline at end of file +- In functions, you should add a [docstring](https://peps.python.org/pep-0257/#one-line-docstrings) + +Here's an example of a single-line docstring from the [PEP 257 docstring guidelines](https://peps.python.org/pep-0257/#one-line-docstrings) +```python +def kos_root(): + """Return the pathname of the KOS root directory.""" + global _kos_root + if _kos_root: return _kos_root + ... +``` + +## + +Docstrings can be [multiline](https://peps.python.org/pep-0257/#multi-line-docstrings) too: + +```python +def complex(real=0.0, imag=0.0): + """Form a complex number. + + Keyword arguments: + real -- the real part (default 0.0) + imag -- the imaginary part (default 0.0) + """ + if imag == 0.0 and real == 0.0: + return complex_zero +``` + +# Releases on GitHub + +## Releases + +> [Releases](https://docs.github.com/en/repositories/releasing-projects-on-github/about-releases#about-releases) are deployable software iterations you can package and make available for a wider audience to download and use. + +- A release takes a snapshot of your entire repository at a specific time, bundles it all into a zipped file, and stamps it with a version number (like v1.2.0), making it easy for you to reference the exact version of your code you used for a scientific project +- You can link your GitHub repository to Zenodo and get a DOI for your releases + +- [Creating a release on GitHub](https://docs.github.com/en/repositories/releasing-projects-on-github/managing-releases-in-a-repository#creating-a-release){preview-link="true"} + +# Working with an old project + +## + +You probably already have multiple different projects in progress, and don't have the time or capacity to go back and organise everything as we've explained. + +What can you do when faced with an overwhelmingly messy codebase? + +Apply the DeReLiCT acronym: + +- Dependencies +- Repository +- License +- Citation +- Testing + +Learn more [here](https://derelict.streamlit.app/). + +## Agenda + +| Start time | End time | Duration | Content | +|---|---|---|---| +| **10:00** | **10:50** | **50 min** | **Intro presentation** | +| 10:50 | 11:00 | _10 min_ | _Short break_ | +| **11:00** | **12:00** | **60 min** | **Version control and project organisation** | +| 12:00 | 13:00 | _60 min_ | _Lunch_ | +| **13:00** | **13:50** | **50 min** | **Testing and linting code** | +| 13:50 | 14:00 | _10 min_ | _Short break_ | +| **14:00** | **14:45** | **45 min** | **Documentation and automated workflows** | +| 14:45 | 15:00 | _15 min_ | _Short break_ | +| **15:00** | **15:45** | **45 min** | **Packaging and releases** | +| **15:45** | **16:00** | **15 min** | **Questions, wrap-up** |