-
Notifications
You must be signed in to change notification settings - Fork 0
Machine Learning on local or Cloud based NVidia or Apple GPUs
This blog details various configurations around running machine learning software towards LLM or general AI based applications on a variety of hardware including NVidia professional workstation GPUs locally or on the cloud or local Apple ARM hardware.
Batch Size variations among GPUs (shorter time per iteration is better). Notice that a dual GPU setup is performant only for large batch sizes over 1024 and correlates to GPU core count - in this case 32768 cores for the dual RTX-4050.
- https://obrienlabs.medium.com/running-the-larger-google-gemma-7b-35gb-llm-for-7x-inference-performance-gain-8b63019523bb
- https://github.com/ObrienlabsDev/machine-learning/issues
- https://github.com/ObrienlabsDev/blog/issues/13
- https://github.com/ObrienlabsDev/blog/issues/9
- Order the 64G laptop not the 96G version for now https://forums.lenovo.com/t5/ThinkPad-P-and-W-Series-Mobile-Workstations/P1-Gen6-Bricked-after-BSOD-second-laptop-with-the-same-problem/m-p/5254145?page=2#6148028
2019 Lenovo P17 Gen 1 : Xeon W-10855M 128G and NVidia Quadro RTX-5000 TU104 Turing 3072 cores 16G 256bit VRam
-
Machine Learning Crash Course https://developers.google.com/machine-learning/crash-course/representation/cleaning-data
-
learn gradient ascent and expand the partial derivative section - "the negative of the gradient vector points into the valley" https://developers.google.com/machine-learning/crash-course/reducing-loss/gradient-descent
-
deep field before deep learning https://esahubble.org/images/heic0611b/ https://simbad.u-strasbg.fr/simbad/sim-id?Ident=Hubble+Ultra+Deep+Field
-
https://en.wikipedia.org/wiki/Comparison_of_deep_learning_software
-
tree classifier using area under the curve - https://dmip.webs.upv.es/papers/ICML2002presentation.pdf - the greater AUC means better positive/negative classification
-
XGBoost - https://xgboost.readthedocs.io/en/stable/tutorials/model.html https://www.analyticsvidhya.com/blog/2018/09/an-end-to-end-guide-to-understand-the-math-behind-xgboost/#:~:text=XGBoost%20is%20a%20machine%20learning,won%20several%20machine%20learning%20competitions.
-
https://codelabs.developers.google.com/vertex_notebook_executor#0
-
https://www.tensorflow.org/guide/tpu#distribution_strategies
-
TPU nodes(gRPC)/VMs(ssh) and twisted topology https://cloud.google.com/tpu/docs/system-architecture-tpu-vm
-
TPU V4 up to 2048 TPU cores - https://cloud.google.com/tpu/docs/supported-tpu-configurations
-
JAX Autograd (automated gradient function) and XLA (Accelerated Linear Algebra - see cuBLAS) https://jax.readthedocs.io/en/latest/
-
https://neptune.ai/blog/retraining-model-during-deployment-continuous-training-continuous-testing
-
hashing or homomorphic encryption https://fastdatascience.com/sensitive-data-machine-learning-model/
-
TensorFlow Data Validation and Pandas https://www.tensorflow.org/tfx/data_validation/get_started
-
TensorFlow from Google Brain https://en.wikipedia.org/wiki/TensorFlow#TensorFlow
-
Batch and Streaming data processing https://beam.apache.org/
-
https://medium.com/mlpoint/pandas-for-machine-learning-53846bc9a98b
-
training with mini-batch gradient descent https://towardsdatascience.com/batch-mini-batch-stochastic-gradient-descent-7a62ecba642a
-
https://en.wikipedia.org/wiki/Regularization_%28mathematics%29
-
training with L1 regularization (prevent overfitting) https://towardsdatascience.com/regularization-in-deep-learning-l1-l2-and-dropout-377e75acc036
-
small normalized wide dataset (reduce feature scaling for training) https://developers.google.com/machine-learning/data-prep/transform/normalization
-
PCA https://www.analyticsvidhya.com/blog/2022/07/principal-component-analysis-beginner-friendly/
-
reduce ML latency https://cloud.google.com/architecture/minimizing-predictive-serving-latency-in-machine-learning#optimizing_models_for_serving
-
https://www.tensorflow.org/guide/keras/serialization_and_saving
-
https://cloud.google.com/vertex-ai/docs/model-registry/introduction
-
https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc
-
https://cloud.google.com/vertex-ai/docs/workbench/managed/schedule-managed-notebooks-run-quickstart
-
https://cloud.google.com/vertex-ai/docs/pipelines/run-pipeline
-
https://cloud.google.com/architecture/setting-up-mlops-with-composer-and-mlflow
-
https://cloud.google.com/tpu/docs/intro-to-tpu#when_to_use_tpus
-
https://www.tensorflow.org/tutorials/distribute/multi_worker_with_ctl
-
https://cloud.google.com/dlp/docs/transformations-reference#transformation_methods
-
https://cloud.google.com/blog/products/identity-security/next-onair20-security-week-session-guide
-
https://cloud.google.com/tensorflow-enterprise/docs/overview
-
https://developers.google.com/machine-learning/crash-course/representation/cleaning-data
-
https://developers.google.com/machine-learning/testing-debugging/metrics/interpretic
-
https://developers.google.com/machine-learning/crash-course/feature-crosses/video-lecture
-
https://cloud.google.com/vertex-ai/docs/training/hyperparameter-tuning-overview
-
https://cloud.google.com/automl-tables/docs/evaluate#evaluation_metrics_for_regression_models
-
https://developers.google.com/machine-learning/glossary#baseline
-
https://cloud.google.com/ai-platform/training/docs/training-at-scale
-
https://cloud.google.com/ai-platform/training/docs/machine-types#scale_tiers
-
https://cloud.google.com/vertex-ai/docs/training/distributed-training
-
https://cloud.google.com/ai-platform/training/docs/overview#distributed_training_structure
-
https://cloud.google.com/vertex-ai/docs/featurestore/overview#benefits
-
https://cloud.google.com/architecture/ml-on-gcp-best-practices#model-deployment-and-serving
-
https://cloud.google.com/memorystore/docs/redis/redis-overview
-
https://cloud.google.com/vertex-ai/docs/experiments/tensorboard-overview
-
https://cloud.google.com/vertex-ai/docs/ml-metadata/introduction
-
https://cloud.google.com/vertex-ai/docs/pipelines/visualize-pipeline
-
https://cloud.google.com/vertex-ai/docs/model-monitoring/overview
-
https://cloud.google.com/architecture/best-practices-for-ml-performance-cost
-
https://www.tensorflow.org/lite/performance/model_optimization
-
https://www.tensorflow.org/tutorials/images/transfer_learning
-
https://developers.google.com/machine-learning/glossary#calibration-layer
-
https://developers.google.com/machine-learning/testing-debugging/common/overview
-
https://cloud.google.com/bigquery-ml/docs/preventing-overfitting
-
https://www.tensorflow.org/tutorials/keras/overfit_and_underfit
-
https://cloud.google.com/architecture/implementing-deployment-and-testing-strategies-on-gke
-
https://docs.seldon.io/projects/seldon-core/en/latest/analytics/routers.html
-
https://www.tensorflow.org/tutorials/customization/custom_layers
-
https://www.tensorflow.org/api_docs/python/tf/keras/layers/Lambda
-
https://cloud.google.com/vertex-ai/docs/ml-metadata/tracking
-
https://cloud.google.com/architecture/ml-on-gcp-best-practices#operationalized-training
-
https://cloud.google.com/architecture/ml-on-gcp-best-practices#organize-your-ml-model-artifacts
- 24GB 384 bit 1008 GB/s 16384 cores 76B transistors 1344 GTexels
- 12GB 192 bit 432 GB/s 5120 cores 35B transistors 319 GTexels
- 20GB
- 16GB
- 16GB