Apple has just released SD for Mac #344

aleemb · 2022-12-01T22:14:35Z

Apple has just released SD for Mac with better performance: https://github.com/apple/ml-stable-diffusion

arunavo4 · 2022-12-02T03:07:38Z

Requires macOS 13.1, which is currently in beta 4. Release Date Mid- December

pressreset · 2022-12-02T10:52:34Z

From my understanding, only the Swift libs req 13.1 to integrate SD in CoreML into your Swift app. DBee uses Electron as the frontend, and Python as it's backend, so it should really just be a matter of converting models to CoreML using the torch2coreml conversion and then changing how diffusionbee_backend.py is implemented.

IIIIIIIllllllllIIIII · 2022-12-02T11:27:23Z

https://github.com/apple/ml-stable-diffusion

pressreset · 2022-12-02T12:31:59Z

Just confirmed.

It will only build on 13.1, even /w the Python distro because the CoreML model versions it uses are V7, not V6.

RuntimeError: Error compiling model: "Error reading protobuf spec. validator error: The model supplied is of version 7, intended for a newer version of Xcode. This version of Xcode supports model version 6 or earlier.".

pressreset · 2022-12-02T12:47:49Z

Here is the good news/bad news.

Good news:

Swift implementation loading times for the model are reduced to 2 seconds.
8GB M1 devices/iOS devices are included in support.
Memory pressure is reduced to around 3GB for CoreML implementation.
It's significantly faster.

Bad news:

It doesn't run on anything but 13.1+.
Someone will need to fork Apple's ml-stable-diffusion repo and disable --safety-checker in the build.
Models will need to be converted to CoreML and we will need conversion tools for that.
There is currently no way to train a model in CoreML directly that I am aware of so models will need to be trained/merged in pytorch outside of CoreML model implementation then converted to CoreML models.
I am sure Apple will change their App store requirements so you need --safety-checker flag enabled.

This sucks for my use cases, because I am processing video frames and not only will it falsely flag things. Some form of nudity is just going to appear in people's films at certain points and I can't control what people are processing in my plugins/apps.

cpietsch · 2022-12-02T13:45:50Z

I am currently converting SD 2.0 on my macbook pro 16 m1.
There is an initial warning
!!! macOS 13.1 and newer or iOS/iPadOS 16.2 and newer is required for best performance !!!
but so far it does it job...

1:42 seconds on macOs 13.0

python -m python_coreml_stable_diffusion.pipeline --prompt "a photo of an astronaut riding a horse on mars" -i sd2.0 -o out --compute-unit ALL --seed 93 --model-version stabilityai/stable-diffusion-2-base
WARNING:coremltools:Torch version 1.13.0 has not been tested with coremltools. You may run into unexpected errors. Torch 1.12.1 is the most recent version that has been tested.
INFO:__main__:Setting random seed to 93
INFO:__main__:Initializing PyTorch pipe for reference configuration
Fetching 12 files: 100%|████████████████████████████████████████| 12/12 [00:00<00:00, 14669.67it/s]
WARNING:__main__:Original diffusers pipeline for stabilityai/stable-diffusion-2-base does not have a safety_checker, Core ML pipeline will mirror this behavior.
INFO:__main__:Removed PyTorch pipe to reduce peak memory consumption
INFO:__main__:Loading Core ML models in memory from sd2.0
INFO:python_coreml_stable_diffusion.coreml_model:Loading text_encoder mlpackage
INFO:python_coreml_stable_diffusion.coreml_model:Loading sd2.0/Stable_Diffusion_version_stabilityai_stable-diffusion-2-base_text_encoder.mlpackage
INFO:python_coreml_stable_diffusion.coreml_model:Done. Took 15.3 seconds.
INFO:python_coreml_stable_diffusion.coreml_model:Loading a CoreML model through coremltools triggers compilation every time. The Swift package we provide uses precompiled Core ML models (.mlmodelc) to avoid compile-on-load.
INFO:python_coreml_stable_diffusion.coreml_model:Loading unet mlpackage
INFO:python_coreml_stable_diffusion.coreml_model:Loading sd2.0/Stable_Diffusion_version_stabilityai_stable-diffusion-2-base_unet.mlpackage
INFO:python_coreml_stable_diffusion.coreml_model:Done. Took 118.5 seconds.
INFO:python_coreml_stable_diffusion.coreml_model:Loading a CoreML model through coremltools triggers compilation every time. The Swift package we provide uses precompiled Core ML models (.mlmodelc) to avoid compile-on-load.
INFO:python_coreml_stable_diffusion.coreml_model:Loading vae_decoder mlpackage
INFO:python_coreml_stable_diffusion.coreml_model:Loading sd2.0/Stable_Diffusion_version_stabilityai_stable-diffusion-2-base_vae_decoder.mlpackage
INFO:python_coreml_stable_diffusion.coreml_model:Done. Took 5.4 seconds.
INFO:__main__:Done.
INFO:__main__:Initializing Core ML pipe for image generation
WARNING:__main__:You have disabled the safety checker for <class '__main__.CoreMLStableDiffusionPipeline'> by passing `safety_checker=None`. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 .
INFO:__main__:Stable Diffusion configured to generate 512x512 images
INFO:__main__:Done.
INFO:__main__:Beginning image generation.
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 51/51 [01:42<00:00,  2.02s/it]
INFO:__main__:Saving generated image to out/a_photo_of_an_astronaut_riding_a_horse_on_mars/randomSeed_93_computeUnit_ALL_modelVersion_stabilityai_stable-diffusion-2-base.png

vicento · 2022-12-02T20:49:31Z

Apple has already converted stable models 👍 in core ML models
https://huggingface.co/apple

if you want to convert your custom model

Converting Models to Core ML

Click to expand
Step 1: Create a Python environment and install dependencies:

conda create -n coreml_stable_diffusion python=3.8 -y
conda activate coreml_stable_diffusion
cd /path/to/cloned/ml-stable-diffusion/repository
pip install -e .

Step 2: Log in to or register for your Hugging Face account, generate a User Access Token and use this token to set up Hugging Face API access by running huggingface-cli login in a Terminal window.

Step 3: Navigate to the version of Stable Diffusion that you would like to use on Hugging Face Hub and accept its Terms of Use. The default model version is CompVis/stable-diffusion-v1-4. The model version may be changed by the user as described in the next step.

Step 4: Execute the following command from the Terminal to generate Core ML model files (.mlpackage)

python -m python_coreml_stable_diffusion.torch2coreml --convert-unet --convert-text-encoder --convert-vae-decoder --convert-safety-checker -o <output-mlpackages-directory> WARNING: This command will download several GB worth of PyTorch checkpoints from Hugging Face.

This generally takes 15-20 minutes on an M1 MacBook Pro. Upon successful execution, the 4 neural network models that comprise Stable Diffusion will have been converted from PyTorch to Core ML (.mlpackage) and saved into the specified . Some additional notable arguments:

--model-version: The model version defaults to CompVis/stable-diffusion-v1-4. Developers may specify other versions that are available on Hugging Face Hub, e.g. stabilityai/stable-diffusion-2-base & runwayml/stable-diffusion-v1-5.

--bundle-resources-for-swift-cli: Compiles all 4 models and bundles them along with necessary resources for text tokenization into /Resources which should provided as input to the Swift package. This flag is not necessary for the diffusers-based Python pipeline.

--chunk-unet: Splits the Unet model in two approximately equal chunks (each with less than 1GB of weights) for mobile-friendly deployment. This is required for ANE deployment on iOS and iPadOS. This is not required for macOS. Swift CLI is able to consume both the chunked and regular versions of the Unet model but prioritizes the former. Note that chunked unet is not compatible with the Python pipeline because Python pipeline is intended for macOS only. Chunking is for on-device deployment with Swift only.

--attention-implementation: Defaults to SPLIT_EINSUM which is the implementation described in Deploying Transformers on the Apple Neural Engine. --attention-implementation ORIGINAL will switch to an alternative that should be used for non-ANE deployment. Please refer to the Performance Benchmark section for further guidance.

--check-output-correctness: Compares original PyTorch model's outputs to final Core ML model's outputs. This flag increases RAM consumption significantly so it is recommended only for debugging purposes.

7k50 · 2022-12-02T21:31:38Z

Please let us know if/when/how Apple's SD will/can be implemented in DiffusionBee. I'm assuming this might hopefully happen sooner or later?

pressreset · 2022-12-02T21:42:52Z

@vicento I've already done the build/conversion and set up a Swift project with a basic prompt on 13.1. The Apple provided library only converts the default SD models. It will download them into ~/.cache/HuggingFace automatically and then convert. It takes about 5m to convert on an M1 16gb.

gingerbeardman · 2022-12-03T13:59:21Z

How much faster is this Apple version?

pressreset · 2022-12-03T14:50:12Z

@gingerbeardman The OpenML Swift package takes between 2-3 seconds to load the model when in OpenML format. The Python package takes anywhere from 5-9 seconds and sometimes a little longer, however it is a significant speed increase over the existing torch MPS implementation. It also requires less memory (around 3GB), which results in lower memory pressure overall. Generation times can vary depending on the method chosen. Methods available are CPU/GPU, CPU/NE, ALL. ALL is not always as fast as CPU/GPU or CPU/NE depending on the operations being performed. Generation times are significantly reduced, to a fraction of the time required. Apple's repo can only generate 512x512 at the moment so it is up to whoever is forking their packages to implement different output sizes. As always any pixel increase requires more memory/generation time.

cpietsch · 2022-12-03T18:20:28Z

for now it looks like macOS 13.1+ is required for best performance. it still can run slowly on older versions

pressreset · 2022-12-03T18:30:46Z

If I had to guess It will probably be like the different builds that there are now for Intel vs M, or there will be 1 build that uses the best option. I doubt Divam is going to just tell everyone "You have to upgrade to 13.1.". Right now there is M build, M HQ build, and Intel build.

tenko23 · 2022-12-09T10:38:16Z

...an Intel build above a certain MacOS, that is ;) .

aajank · 2022-12-13T22:36:55Z

Since 13.1 is here.

godly-devotion · 2022-12-17T00:18:12Z

Okay so I was able to quickly cobble an app together using Apple's SD implementation. You do need the latest version of Ventura but performance does look promising (since its all native).
https://github.com/godly-devotion/MochiDiffusion

juan9999 · 2022-12-19T17:36:24Z

mochi performance is about 11% faster when generating 8 images on a 32GB max

tenko23 · 2022-12-19T18:27:08Z

I didn't time it, but on a Mac Mini M1 with 16GB ram, it only took a fraction of the time for one image.

whosawhatsis · 2022-12-19T21:36:06Z

Using the coreml code running on the GPU of an M1 Pro/Max with lots of RAM seems to be a small but nice improvement on speed.

The big improvements, though, come when you use the neural engine for image generation. These improvements don't make it faster on these more powerful chips, but make the process much more efficient, using >1gb of RAM and >5W. This makes it a HUGE improvement for lower-power machines like base M1/M2 models, and even the A14, which has the same 16-core neural engine, should get similar performance.

Zabriskije · 2022-12-24T14:27:29Z

I’ve tested out speed difference on a MacBook Air M1 (8 cpu/gpu core, 8gb ram) with DiffusionBee and Mochi Diffusion (both single image, 30 steps, 512x512, Anything v3.0; Mochi with CPU/NE) to see the difference:

DiffusionBee: 40,46s;
Mochi Diffusion: 21,03s.

That’s a pretty nice jump.

ivucica mentioned this issue Feb 2, 2023

eGPU: Cannot choose a GPU, "Prefer External GPU" does not propagate to diffusionbee_backend #384

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Apple has just released SD for Mac #344

Apple has just released SD for Mac #344

aleemb commented Dec 1, 2022

arunavo4 commented Dec 2, 2022

pressreset commented Dec 2, 2022 •

edited

Loading

IIIIIIIllllllllIIIII commented Dec 2, 2022

pressreset commented Dec 2, 2022

pressreset commented Dec 2, 2022

cpietsch commented Dec 2, 2022 •

edited

Loading

vicento commented Dec 2, 2022 •

edited

Loading

7k50 commented Dec 2, 2022 •

edited

Loading

pressreset commented Dec 2, 2022

gingerbeardman commented Dec 3, 2022

pressreset commented Dec 3, 2022

cpietsch commented Dec 3, 2022

pressreset commented Dec 3, 2022

tenko23 commented Dec 9, 2022

aajank commented Dec 13, 2022

godly-devotion commented Dec 17, 2022 •

edited

Loading

juan9999 commented Dec 19, 2022

tenko23 commented Dec 19, 2022

whosawhatsis commented Dec 19, 2022

Zabriskije commented Dec 24, 2022

Apple has just released SD for Mac #344

Apple has just released SD for Mac #344

Comments

aleemb commented Dec 1, 2022

arunavo4 commented Dec 2, 2022

pressreset commented Dec 2, 2022 • edited Loading

IIIIIIIllllllllIIIII commented Dec 2, 2022

pressreset commented Dec 2, 2022

pressreset commented Dec 2, 2022

cpietsch commented Dec 2, 2022 • edited Loading

vicento commented Dec 2, 2022 • edited Loading

7k50 commented Dec 2, 2022 • edited Loading

pressreset commented Dec 2, 2022

gingerbeardman commented Dec 3, 2022

pressreset commented Dec 3, 2022

cpietsch commented Dec 3, 2022

pressreset commented Dec 3, 2022

tenko23 commented Dec 9, 2022

aajank commented Dec 13, 2022

godly-devotion commented Dec 17, 2022 • edited Loading

juan9999 commented Dec 19, 2022

tenko23 commented Dec 19, 2022

whosawhatsis commented Dec 19, 2022

Zabriskije commented Dec 24, 2022

pressreset commented Dec 2, 2022 •

edited

Loading

cpietsch commented Dec 2, 2022 •

edited

Loading

vicento commented Dec 2, 2022 •

edited

Loading

7k50 commented Dec 2, 2022 •

edited

Loading

godly-devotion commented Dec 17, 2022 •

edited

Loading