Skip to content

Machine Learning on local or Cloud based NVidia or Apple GPUs

Michael O'Brien edited this page Dec 3, 2024 · 14 revisions

Introduction

This blog details various configurations around running machine learning software towards LLM or general AI based applications on a variety of hardware including NVidia professional workstation GPUs locally or on the cloud or local Apple ARM hardware.

Performance

Batch Size variations among GPUs (shorter time per iteration is better). Notice that a dual GPU setup is performant only for large batch sizes over 1024 and correlates to GPU core count - in this case 32768 cores for the dual RTX-4050.

Screenshot 2024-12-03 at 11 39 44

Quickstart

Setup

Architecture

DevOps

Example ML Systems

2023 Lenovo P1 Gen 6 : i7-13800H 64G and NVidia RTX-A3500 Ada AD-104 5120 cores 12G 192bit VRam

2019 Lenovo P17 Gen 1 : Xeon W-10855M 128G and NVidia Quadro RTX-5000 TU104 Turing 3072 cores 16G 256bit VRam

2023 Custom : i9-13900K 192G and Dual NVidia GTX-4090 MSI Suprim Liquid X

2023 Custom : i9-13900K 128G and Dual NVidia RTX-A4500 with NVidia RTX-4000

2021 Lenovo X1 Carbon gen 9 : Intel GPU

Google Cloud Workstation : NVidia L4 GPU

Google Pixel 6 : Google TPU

Links

PMLE Training

PMLE Notes

Hardware

AD102 RTX-4090 Ada Consumer

  • 24GB 384 bit 1008 GB/s 16384 cores 76B transistors 1344 GTexels

AD104 RTX-3500 Ada Mobile Workstation P1Gen6 2023

  • 12GB 192 bit 432 GB/s 5120 cores 35B transistors 319 GTexels

RTX-A4500 Ampere Workstation 2021

  • 20GB

RTX-A4000 Ampere Workstation 2021

  • 16GB

RTX-5000 Lenovo P17Gen1 2020

  • 16GB
Clone this wiki locally