-
Notifications
You must be signed in to change notification settings - Fork 0
Machine Learning on local or Cloud based NVidia or Apple GPUs
This blog details various configurations around running machine learning software towards LLM or general AI based applications on a variety of hardware including NVidia professional workstation GPUs locally or on the cloud or local Apple ARM hardware.
- Order the 64G laptop not the 96G version for now https://forums.lenovo.com/t5/ThinkPad-P-and-W-Series-Mobile-Workstations/P1-Gen6-Bricked-after-BSOD-second-laptop-with-the-same-problem/m-p/5254145?page=2#6148028
2019 Lenovo P17 Gen 1 : Xeon W-10855M 128G and NVidia Quadro RTX-5000 TU104 Turing 3072 cores 16G 256bit VRam
-
Machine Learning Crash Course https://developers.google.com/machine-learning/crash-course/representation/cleaning-data
-
learn gradient ascent and expand the partial derivative section - "the negative of the gradient vector points into the valley" https://developers.google.com/machine-learning/crash-course/reducing-loss/gradient-descent
-
deep field before deep learning https://esahubble.org/images/heic0611b/ https://simbad.u-strasbg.fr/simbad/sim-id?Ident=Hubble+Ultra+Deep+Field
-
https://en.wikipedia.org/wiki/Comparison_of_deep_learning_software
-
tree classifier using area under the curve - https://dmip.webs.upv.es/papers/ICML2002presentation.pdf - the greater AUC means better positive/negative classification
-
XGBoost - https://xgboost.readthedocs.io/en/stable/tutorials/model.html https://www.analyticsvidhya.com/blog/2018/09/an-end-to-end-guide-to-understand-the-math-behind-xgboost/#:~:text=XGBoost%20is%20a%20machine%20learning,won%20several%20machine%20learning%20competitions.
-
https://codelabs.developers.google.com/vertex_notebook_executor#0
-
https://www.tensorflow.org/guide/tpu#distribution_strategies
-
TPU nodes(gRPC)/VMs(ssh) and twisted topology https://cloud.google.com/tpu/docs/system-architecture-tpu-vm
-
TPU V4 up to 2048 TPU cores - https://cloud.google.com/tpu/docs/supported-tpu-configurations
-
JAX Autograd (automated gradient function) and XLA (Accelerated Linear Algebra - see cuBLAS) https://jax.readthedocs.io/en/latest/
-
https://neptune.ai/blog/retraining-model-during-deployment-continuous-training-continuous-testing
-
hashing or homomorphic encryption https://fastdatascience.com/sensitive-data-machine-learning-model/
-
TensorFlow Data Validation and Pandas https://www.tensorflow.org/tfx/data_validation/get_started
-
TensorFlow from Google Brain https://en.wikipedia.org/wiki/TensorFlow#TensorFlow
-
Batch and Streaming data processing https://beam.apache.org/
-
https://medium.com/mlpoint/pandas-for-machine-learning-53846bc9a98b
-
training with mini-batch gradient descent https://towardsdatascience.com/batch-mini-batch-stochastic-gradient-descent-7a62ecba642a
-
https://en.wikipedia.org/wiki/Regularization_%28mathematics%29
-
training with L1 regularization (prevent overfitting) https://towardsdatascience.com/regularization-in-deep-learning-l1-l2-and-dropout-377e75acc036
-
small normalized wide dataset (reduce feature scaling for training) https://developers.google.com/machine-learning/data-prep/transform/normalization
-
PCA https://www.analyticsvidhya.com/blog/2022/07/principal-component-analysis-beginner-friendly/
-
reduce ML latency https://cloud.google.com/architecture/minimizing-predictive-serving-latency-in-machine-learning#optimizing_models_for_serving
-
https://www.tensorflow.org/guide/keras/serialization_and_saving
-
https://cloud.google.com/vertex-ai/docs/model-registry/introduction
-
https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc
-
https://cloud.google.com/vertex-ai/docs/workbench/managed/schedule-managed-notebooks-run-quickstart
-
https://cloud.google.com/vertex-ai/docs/pipelines/run-pipeline
-
https://cloud.google.com/architecture/setting-up-mlops-with-composer-and-mlflow
-
https://cloud.google.com/tpu/docs/intro-to-tpu#when_to_use_tpus
-
https://www.tensorflow.org/tutorials/distribute/multi_worker_with_ctl
-
https://cloud.google.com/dlp/docs/transformations-reference#transformation_methods
-
https://cloud.google.com/blog/products/identity-security/next-onair20-security-week-session-guide
-
https://cloud.google.com/tensorflow-enterprise/docs/overview
-
https://developers.google.com/machine-learning/crash-course/representation/cleaning-data
-
https://developers.google.com/machine-learning/testing-debugging/metrics/interpretic
-
https://developers.google.com/machine-learning/crash-course/feature-crosses/video-lecture
-
https://cloud.google.com/vertex-ai/docs/training/hyperparameter-tuning-overview
-
https://cloud.google.com/automl-tables/docs/evaluate#evaluation_metrics_for_regression_models
-
https://developers.google.com/machine-learning/glossary#baseline
-
https://cloud.google.com/ai-platform/training/docs/training-at-scale
-
https://cloud.google.com/ai-platform/training/docs/machine-types#scale_tiers
-
https://cloud.google.com/vertex-ai/docs/training/distributed-training
-
https://cloud.google.com/ai-platform/training/docs/overview#distributed_training_structure
-
https://cloud.google.com/vertex-ai/docs/featurestore/overview#benefits
-
https://cloud.google.com/architecture/ml-on-gcp-best-practices#model-deployment-and-serving
-
https://cloud.google.com/memorystore/docs/redis/redis-overview
-
https://cloud.google.com/vertex-ai/docs/experiments/tensorboard-overview
-
https://cloud.google.com/vertex-ai/docs/ml-metadata/introduction
-
https://cloud.google.com/vertex-ai/docs/pipelines/visualize-pipeline
-
https://cloud.google.com/vertex-ai/docs/model-monitoring/overview
-
https://cloud.google.com/architecture/best-practices-for-ml-performance-cost
-
https://www.tensorflow.org/lite/performance/model_optimization
-
https://www.tensorflow.org/tutorials/images/transfer_learning
-
https://developers.google.com/machine-learning/glossary#calibration-layer
-
https://developers.google.com/machine-learning/testing-debugging/common/overview
-
https://cloud.google.com/bigquery-ml/docs/preventing-overfitting
-
https://www.tensorflow.org/tutorials/keras/overfit_and_underfit
-
https://cloud.google.com/architecture/implementing-deployment-and-testing-strategies-on-gke
-
https://docs.seldon.io/projects/seldon-core/en/latest/analytics/routers.html
-
https://www.tensorflow.org/tutorials/customization/custom_layers
-
https://www.tensorflow.org/api_docs/python/tf/keras/layers/Lambda
-
https://cloud.google.com/vertex-ai/docs/ml-metadata/tracking
-
https://cloud.google.com/architecture/ml-on-gcp-best-practices#operationalized-training
-
https://cloud.google.com/architecture/ml-on-gcp-best-practices#organize-your-ml-model-artifacts