eBay

GPTshop.ai presents: the ultimate high-end desktop supercomputers for AI GPT LLM ML and HPC. Run and tune the biggest GPT large language models locally. Get our mind-blowing NVIDIA GH200 Grace Hopper superchip systems. We are very proud to announce the world's first and only Nvidia GH200 Grace Hopper Superchip-powered supercomputer in a quiet, handy and beautiful desktop form factor. Our benchmarks show that it is currently by far the fastest AI and by far the fastest ARM desktop PC in the world. If you are looking for a workstation for inferencing and fine-tuning of insanely huge LLMs, we got you covered.

More info and configuration: https://gptshop.ai

Example use case 1: Inferencing Llama-3 400B

Download: https://ai.meta.com/blog/meta-llama-3/

Llama-3 400B (to be released) will be the most powerful open-source model by far.

Llama-3 400B with 8bit quantization needs at least 400GB of memory to swiftly run inference! Luckily, GH200 has a minimum of 576GB.

Example use case 2: Fine-tuning Llama-3 400B with PyTorch FSDP and Q-Lora

Tutorial: https://www.philschmid.de/fsdp-qlora-llama3

Models need to be fine-tuned on your data to unlock the full potential of the model. But efficiently fine-tuning bigger models like Llama 3 400B remained a challenge until now. This blog post walks you through how to fine-tune Llama 3 using PyTorch FSDP and Q-Lora with the help of Hugging Face TRL, Transformers, peft & datasets.

Fine-tuning big models within a reasonable time requires special and beefy hardware! Luckily, GH200 is ideal for this task.

Example use case 3: Generate videos with StoryDiffusion

Github: https://storydiffusion.github.io/

StoryDiffusion creates comics in various styles through the proposed consistent self-attention, maintaining consistent character styles and attires for cohesive storytelling.

Generating videos with StoryDiffusion requires special and beefy hardware! Luckily, GH200 is ideal for this task.

Why should you buy your own hardware?

"You'll own nothing and you'll be happy?" No!!! Never should you bow to Satan and rent stuff that you can own. In other areas, renting stuff that you can own is very uncool and uncommon. Or would you prefer to rent "your" car instead of owning it? Most people prefer to own their car, because it's much cheaper, it's an asset that has value and it makes the owner proud and happy. The same is true for compute infrastructure.

Even more so, because data and compute infrastructure are of great value and importance and are preferably kept on premises, not only for privacy reasons but also to keep control and mitigate risks. If somebody else has your data and your compute infrastructure you are in big trouble.

Speed, latency and ease-of-use are also much better when you have direct physical access to your stuff.

With respect to AI and specifically LLMs there is another very important aspect. The first thing big tech taught their closed-source LLMs was to be "politically correct" (lie) and implement guardrails, "safety" and censorship to such an extent that the usefulness of these LLMs is severely limited. Luckily, the (open-source) tools are out there to build and tune AI that is really intelligent and really useful. But first, you need your own hardware to run it on.

What are the main benefits of GH200 Grace-Hopper?

Its performance in every regard is almost unreal (up to 284 times faster than x86).

Much cheaper than alternative systems with the same amount of memory.

It has enough memory to run the biggest LLMs currently available.

Optimized for memory-intensive inference and HPC performance.

Ideal for AI, especially inference and fine-tuning of LLMs.

Ideal for HPC applications like, e.g. genome sequencing.

Connect display and keyboard, and you are ready to go.

You can use it as a server or a desktop/workstation.

Easily customizable, upgradable and repairable.

Privacy and independence from cloud providers.

Cheaper and much faster than cloud providers.

Flexibility and the possibility of offline use.

Perfect for edge AI ML GPT LLM and HPC.

Gigantic amounts of coherent memory.

No special infrastructure is needed.

The lowest possible latency.

It is very power-efficient.

It is easy to transport.

CUDA enabled.

It is very quiet.

It is beautiful.

Runs Linux.

What is the difference to alternative systems with the same amount of memory?

Compared to a 8x Nvidia H100 system, GH200 costs 5x less, consumes 10x less energy and has very roughly the same performance.

Compared to a 8x Nvidia A100 system, GH200 costs 3x less, consumes 5x less energy and has at least the same performance.

Compared to a 4x AMD Mi300X system, GH200 costs 3x less, consumes 4x less energy and has roughly the same performance.

Compared to a 4x AMD Mi300A system (which has only 512 GB memory, more is not possible because the maximum number of scale-up infinity links is 4), GH200 costs significantly less, consumes 3x less energy and has at least the same performance.

Compared to a 8x Nvidia RTX A6000 Ada system which has significantly less memory (only 384GB), GH200 costs significantly less, consumes 3x less energy and has a higher performance.

Compared to a 8x AMD Radeon PRO W7900 system which has significantly less memory (only 384GB), GH200 costs the same, consumes 3x less energy and has a higher performance.

The main difference between GH200 and alternative systems is that with GH200, the GPU is connected to the CPU via a 900 GB/s NVLink vs. 128 GB/s PCIe gen5 used by traditional systems. Furthermore, multiple superchips can be connected via 900 GB/s NVLink vs. orders of magnitudes slower network connections used by traditional systems. Since these are the main bottlenecks, GH200's high-speed connections directly translate to much higher performance compared to traditional architectures.

The alternative systems mentioned above also have one thing in common: they are not available in standard desktop form factors, like our GH200 systems are.

What is the difference to 19-inch server models?

Form factor: 19-inch servers have a very distinct form factor. They are of low height and are very long, e.g. 438 x 87.5 x 900mm (17.24" x 3.44" x 35.43"). This makes them rather unsuitable to place them anywhere else than in a 19-inch rack. Our GH200 and Grace tower models have desktop form factors: 244 x 567 x 523 mm (20.6 x 9.6 x 22.3") or 255 x 565 x 530 mm (20.9 x 10 x 22.2") or 250 x 404 x 359 mm (9.8 x 15.9 x 14.1"). This makes it possible to place them almost anywhere.

Noise: 19-inch servers are extremely loud. The average noise level is typically around 90 decibels, which is as loud as a subway train and exceeds the noise level that is considered safe for workers subject to long-term exposure. In contrast, our GH200 and Grace tower models are very quiet (factory setting is 25 decibels) and they can easily be adjusted to even lower or higher noise levels because each fan can be tuned individually and manually from 0 to 100% PWM duty cycle. Efficient cooling is ensured, because our GH200 tower models have a higher number of fans and the low-revving Noctua fans have a much bigger diameter compared to their 19-inch counterparts and move approximately the same amount of air or even a much higher amount depending on the specific configuration and PWM tuning.

Transportability: 19-inch servers are not meant to be transported, consequently, they lack every feature in this regard. In addition, their form factor makes them rather unsuitable to be transported. Our GH200 tower models, in contrast, can be transported very easily. Our metal and mini cases even feature two handles, which makes moving them around very easy.

Infrastructure: 19-inch servers typically need quite some infrastructure to be able to be deployed. At the very least, a 19-inch mounting rack is definitely required. Our GH200 models do not need any special infrastructure at all. They can be deployed quickly and easily almost everywhere.

Latency: 19-inch servers are typically accessed via network. Because of this, there is always at least some latency. Our GH200 tower models can be used as desktops/workstations. In this use case, latency is virtually non-existent.

Looks: 19-inch server models are not particularly aesthetically pleasing. In contrast, our available case options are in our humble opinion by far the most beautiful there are.

Technical details of our GH200 workstations (base configuration)

Metal tower with two color choices: Titan grey and Champagne gold

Glass tower with four color choices: white, black, green or turquoise

Mini tower with two color choices: white and black

Available air or liquid-cooled

1x/2x Nvidia GH200 Grace Hopper Superchip

1x/2x 72-core Nvidia Grace CPU

1x/2x Nvidia H100 Tensor Core GPU

1x/2x Nvidia H200 Tensor Core GPU

480GB/960GB of LPDDR5X memory with error-correction code (ECC)

96GB of HBM3 or 144GB of HBM3e or 1248GB of HBM3e memory

576GB or 624GB or 1248GB of total fast-access memory

NVLink-C2C: 900 GB/s of bandwidth

Programmable from 450W to 1000W TDP (CPU + GPU + memory)

2x/4x High-efficiency 2000W/2400W PSU

2x/4x PCIe gen4/5 M.2 22110/228 slots on board

2x/4x/8x PCIe gen4/5 drive slots (NVMe)

2x/3x FHFL PCIe Gen5 x16

1x/3x/4x USB 3.0 ports

2x RJ45 10GbE ports

1x RJ45 IPMI port

1x Mini display port

Halogen-free power cables

Stainless steel bolts

Very quiet, the factory setting is 25 decibels (fan speed and thus noise level can be individually and manually configured from 0 to 100% PWM duty cycle)

2 years manufacturer's warranty

244 x 567 x 523 mm (20.6 x 9.6 x 22.3") or 255 x 565 x 530 mm (20.9 x 10 x 22.2") or 250 x 404 x 359 mm (9.8 x 15.9 x 14.1")

30 kg (66 lbs) or 20 kg (44 lbs)

Optional components

Liquid cooling

Bigger custom air-cooled heatsink

NIC Nvidia Bluefield-3 400Gb

NIC Nvidia ConnectX-7 200Gb

NIC Intel 100Gb

WLAN + Bluetooth card

Up to 2x 8TB M.2 SSD

Up to 8x 8TB E1.S SSD

Up to 10x 60TB 2.5" SSD

Storage controller

Raid controller

Additional USB ports

Multi-display graphics card

Sound card

Mouse

Keyboard

Consumer or industrial fans

Intrusion detection

OS preinstalled

Anything possible on request

What are the main differences between the offered GH200 models?

GH200: metal or glass tower, air-cooled, with 1 of 2 M.2 and 0 of 4 E1.S hard disks, 3x USB

GH200 Special Edition: metal or glass tower, air-cooled, without M.2 (0 of 2) and 2.5" (0 of 4) hard disks, 1x USB (mini USB hub included: 3x USB)

GH200 Super: metal or glass tower, air-cooled, with one M.2 (1 of 2) and no E1.S (0 of 8) hard disks, 1x USB (mini USB hub included: 3x USB)

GH200 Giga: metal or glass tower, air-cooled, with one M.2 (1 of 2) and no 2.5" (0 of 4) hard disks, 2x USB

GH200 Liquid: metal or glass tower, liquid-cooled, comes with 1 of 2 M.2 and 0 of 8 E1.S hard disks, 4x USB

GH200 Mini: mini tower, air-cooled, comes with 1 of 2 M.2 and 0 of 2 E1.S hard disks, 1x USB (mini USB hub included: 3x USB)

GH200 Dual: Two NVlinked superchips, metal or glass tower, air-cooled, comes with 2 of 4 M.2 and 0 of 8 E1.S hard disks, 4x USB

GH200 Special Edition Barebone: metal or glass tower, without superchip, air-cooled, without M.2 (0 of 2) and 2.5" (0 of 4) hard disks, 1x USB (mini USB hub included: 3x USB)

Comparison chart: GH200 comparison chart.pdf

Compute performance

67/134 teraFLOPS FP64

989/1978 teraFLOPS TF32

1,979/3,958 teraFLOPS FP16

3,958/7,916 teraFLOPS FP8

3,958/7,916 TOPS INT8

Benchmarks
Phoronix is currently benchmarking our GH200 576GB model prototype. Initial results are available here:

https://www.phoronix.com/review/nvidia-gh200-gptshop-benchmark

https://www.phoronix.com/review/nvidia-gh200-amd-threadripper

https://www.phoronix.com/review/aarch64-64k-kernel-perf

https://www.phoronix.com/review/nvidia-gh200-compilers

White paper: Nvidia GH200 Grace-Hopper white paper

Preorder GB200 Blackwell

The coming Nvidia GB200 Grace-Blackwell Superchip has truly amazing specs to show off. GPTshop.ai systems with Nvidia GB200 Grace-Blackwell will arrive Q4 2024. Be one of the first in the world to get a GB200 desktop/workstation. Preorder now for $5,000 (5,000 €) deposit.