Skip to content

Commit

Permalink
first commit
Browse files Browse the repository at this point in the history
  • Loading branch information
hirtanak committed Sep 23, 2019
0 parents commit 998c760
Show file tree
Hide file tree
Showing 91 changed files with 8,629 additions and 0 deletions.
101 changes: 101 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
# Azure CycleCloud template for ADVC

## Prerequisites

1. Prepaire for your ADVC bilnary.
2. Install CycleCloud CLI

## How to install

1. tar zxvf cyclecloud-ADVC<version>.tar.gz
2. cd cyclecloud-ADVC<version>
2. put ADVC library/model on <template>/blob directory.
4. pug OSS PBS Pro files on <template>/blob directory.
5. Rewrite "Files" attribute for your binariy in "project.ini" file.
6. run "cyclecloud project upload azure-storage" for uploading template to CycleCloud
7. "cyclecloud import_template -f templates/pbs_extended_nfs_starccm.txt" for register this template to your CycleCloud

## How to run ADVC

1. Check License Server setting
3. Upload and Modify PBS script file
4. qsub ~/advcrun.sh (sample as below)

<pre><code>
#!/bin/bash
#PBS -j oe
#PBS -l select=2:ncpus=44
NP=88

## Platform MPI
#MPI_ROOT="/shared/home/azureuser/apps/Solver-2018-R1_3/platform_mpi/bin"
#export MPI_HASIC_UDAPL=ofa-v2-ib0
#export MPI_IB_PKEY="0x8008"

#disable source comamnd in advc-solver.conf
sed -i -e "s/^source/#source/g" ${HOME}/apps/Solver-2019R1_0r19/etc/advc-solver.conf

#Geneeal settings
export ADVC_DIR="/shared/home/azureuser/apps/Solver-2019R1_0r19/bin"
export ALDE_LICENSE_FILE=27000@<Yout License Server IPAddress>

# MPI settings
export MPI_ROOT="/opt/intel/impi/2018.4.274"
export I_MPI_ROOT=$MPI_ROOT
export I_MPI_DEBUG=9
export I_MPI_FABRICS=shm:ofa # for 2019, use I_MPI_FABRICS=shm:ofi
# H16r
#export I_MPI_FABRICS=shm:dapl
#export I_MPI_DAPL_PROVIDER=ofa-v2-ib0
#export I_MPI_DYNAMIC_CONNECTION=0
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/shared/home/azureuser/apps/Solver-2019R1_0r19/user_lib
source /opt/intel/compilers_and_libraries/linux/mpi/bin64/mpivars.sh

# running config
INPUT=/mnt/exports/shared/home/azureuser/model_v2.adv

cd ${PBS_O_WORKDIR}
${ADVC_DIR}/ADVCSolver ${INPUT} -np ${NP} | tee ADVC-`date +%Y%m%d_%H-%M-%S`.log
</pre></code>

## Known Issues
1. This tempate support only single administrator. So you have to use same user between superuser(initial Azure CycleCloud User) and deployment user of this template
2. Currently AutoScale is disabled. you have to create execute node and get IP. In addtion, create hosts file for your execute node environment.

# Azure CycleCloud用テンプレート:ADVC(NFS/PBSPro)

[Azure CycleCloud](https://docs.microsoft.com/en-us/azure/cyclecloud/) はMicrosoft Azure上で簡単にCAE/HPC/Deep Learning用のクラスタ環境を構築できるソリューションです。

![Azure CycleCloudの構築・テンプレート構成](https://raw.githubusercontent.com/hirtanak/osspbsdefault/master/AzureCycleCloud-OSSPBSDefault.png "Azure CycleCloudの構築・テンプレート構成")

Azure CyceCloudのインストールに関しては、[こちら](https://docs.microsoft.com/en-us/azure/cyclecloud/quickstart-install-cyclecloud) のドキュメントを参照してください。

ADVC用のテンプレートになっています。
以下の構成、特徴を持っています。

1. OSS PBS ProジョブスケジューラをMasterノードにインストール
2. H16r, H16r_Promo, HC44rs, HB60rsを想定したテンプレート、イメージ
- OpenLogic CentOS 7.6 HPC を利用
3. Masterノードに512GB * 2 のNFSストレージサーバを搭載
- Executeノード(計算ノード)からNFSをマウント
4. MasterノードのIPアドレスを固定設定
- 一旦停止後、再度起動した場合にアクセスする先のIPアドレスが変更されない

![テンプレート構成](https://raw.githubusercontent.com/hirtanak/osspbsdefault/master/OSSPBSDefaultDiagram.png "テンプレート構成")

OSS PBS Defaultテンプレートインストール方法

前提条件: テンプレートを利用するためには、Azure CycleCloud CLIのインストールと設定が必要です。詳しくは、 [こちら](https://docs.microsoft.com/en-us/azure/cyclecloud/install-cyclecloud-cli) の文書からインストールと展開されたAzure CycleCloudサーバのFQDNの設定が必要です。

1. テンプレート本体をダウンロード
2. 展開、ディレクトリ移動
3. cyclecloudコマンドラインからテンプレートインストール
- tar zxvf cyclecloud-ADVC<version>.tar.gz
- cd cyclecloud-ADVC<version>
- cyclecloud project upload azure-storage
- cyclecloud import_template -f templates/pbs_extended_nfs_starccm.txt
4. 削除したい場合、 cyclecloud delete_template ADVC コマンドで削除可能

***
Copyright Hiroshi Tanaka, [email protected], @hirtanak All rights reserved.
Use of this source code is governed by MIT license that can be found in the LICENSE file.
Binary file added blobs/pbspro-client-18.1.4-0.x86_64.rpm
Binary file not shown.
Binary file added blobs/pbspro-execution-18.1.4-0.x86_64.rpm
Binary file not shown.
Binary file added blobs/pbspro-server-18.1.4-0.x86_64.rpm
Binary file not shown.
16 changes: 16 additions & 0 deletions project.ini
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
[project]
version = 1.0.1
type = application
name = ADVENTURECluster

# binary setting set up. your binary on blobs directry and change below file setting corectly.
[blobs]
Files = pbspro-execution-18.1.4-0.x86_64.rpm, pbspro-server-18.1.4-0.x86_64.rpm, pbspro-client-18.1.4-0.x86_64.rpm
# Sample
#Files = pbspro-execution-18.1.4-0.x86_64.rpm, pbspro-server-18.1.4-0.x86_64.rpm, pbspro-client-18.1.4-0.x86_64.rpm, advcsolver-2019R1.0r19-x86_64-intel_mpi.tar.gz, advcsolver_test_v2-20140929.tar.gz

[config ADVENTURECluster.version]
Required = True
Label = ADVENTURECluster version
Description = Version of ADVENTURECluster to install on the cluster. Package should be named advcsolver-<version>R<revison>-<platform>.tar.gz
DefaultValue = 2019R1.0r19
9 changes: 9 additions & 0 deletions specs/default/chef/roles/pbspro_execute_role.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.
#
name "pbspro_execute_role"
description "PBSPro Execute Role"
run_list("recipe[cshared::client]",
"recipe[cuser]",
"recipe[pbspro::execute]",
"recipe[cganglia::client]")
9 changes: 9 additions & 0 deletions specs/default/chef/roles/pbspro_login_role.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.
#
name "pbspro_login_role"
description "PBSPro Login Role"
run_list("recipe[cshared::client]",
"recipe[cuser]",
"recipe[pbspro::login]",
"recipe[cganglia::client]")
14 changes: 14 additions & 0 deletions specs/default/chef/roles/pbspro_master_role.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.
#
name "pbspro_master_role"
description "Open PBSPro Master Role"
run_list("role[scheduler]",
"recipe[cshared::directories]",
"recipe[pbspro::skel]",
"recipe[cuser]",
"recipe[cshared::server]",
"recipe[pbspro::scheduler]",
"recipe[cganglia::server]")

default_attributes "cyclecloud" => { "discoverable" => true }
29 changes: 29 additions & 0 deletions specs/default/chef/site-cookbooks/pbspro/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
Description
====

Requirements
====

Usage
====


Platform
----

Tested on:

Cookbooks
----


Resources and Providers
====

TODO

Attributes
====

TODO
====
35 changes: 35 additions & 0 deletions specs/default/chef/site-cookbooks/pbspro/attributes/default.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.
#
default[:pbspro][:version] = "18.1.4-0"
default[:pbspro][:slots] = nil

default[:pbspro][:is_grouped] = true

default[:pbspro][:submit_hook][:__comment__] = "This file was generated by serializing node[:cyclecloud][:pbspro][:submit_hook]."
default[:pbspro][:submit_hook][:disable_eager_packing] = true
default[:pbspro][:submit_hook][:enabled] = true

default[:pbspro][:submit_hook][:logging][:level] = "INFO"
default[:pbspro][:submit_hook][:logging][:filename] = "#{node[:cyclecloud][:bootstrap]}/pbs/submit_hook.log"
default[:pbspro][:submit_hook][:logging][:filemode] = "a"
default[:pbspro][:submit_hook][:logging][:format] = "%(asctime)s %(levelname)s %(message)s"


default[:pbspro][:autoscale_hook][:__comment__] = "This file was generated by serializing node[:cyclecloud][:pbspro][:autostart_hook]."
default[:pbspro][:autoscale_hook][:src_dirs] = ["#{node[:cyclecloud][:home]}/system/embedded/lib/python2.7/site-packages",
"#{node[:cyclecloud][:bootstrap]}/pbs"
]
# pass through to the hook config json file
default[:pbspro][:autoscale_hook][:cyclecloud_home] = node[:cyclecloud][:home]
default[:pbspro][:autoscale_hook][:autostart_log_level] = "DEBUG"
default[:pbspro][:autoscale_hook][:autostart_log_file_level] = "DEBUG"

if node[:cyclecloud][:node][:template] == "master"
default[:cyclecloud][:cluster][:autoscale][:idle_time_before_jobs] = 3600
default[:cyclecloud][:cluster][:autoscale][:idle_time_after_jobs] = 300
else
# the pbs autoscaler should be removing idle nodes, but if after an hour of idle time assume something is wrong with the autoscaler.
default[:cyclecloud][:cluster][:autoscale][:idle_time_before_jobs] = 4 * 3600
default[:cyclecloud][:cluster][:autoscale][:idle_time_after_jobs] = 4 * 3600
end
Loading

0 comments on commit 998c760

Please sign in to comment.