Mac: LLama2 model on Apple Silicon and GPU using llama.cpp

It is relatively easy to experiment with a base LLama2 model on M family Apple Silicon, thanks to llama.cpp written by Georgi Gerganov.

The llama.cpp project provides a C++ implementation for running LLama2 models, and takes advantage of the Apple integrated GPU to offer a performant experience (see M family performance specs).

Download model and run

The synchhq team wrote a wrapper script to make it more convenient to:

  • Download the open-source LLama2 model from Tom Jobbins (TheBloke) at huggingface.co
  • Compile llama.cpp with clang
  • Run the llama binary ‘main’ which provides an interactive prompt

We are going to use the project described here, but do need to apply a patch on top to use the newer GGUF file format which is compatible with llama.cpp.

# download wrapper script project
git clone https://github.com/sychhq/llama-cpp-setup.git
cd llama-cpp-setup

# apply necessary patch to use newer GGUF file format
git pull --no-edit --no-ff https://github.com/ctwardy/llama-cpp-setup.git patch-1
# fix flag syntax in patch
sed -i 's/ -n-gpu-layers/ --n-gpu-layers/' setup.sh

# press <ENTER> twice to accept defaults for model file and filename
# wait a few minutes for compilation and then model download
./setup.sh

Interactive Prompt

After the compilation phase, the model file will be downloaded which can take a few minutes.  If all goes well, the binary and will be run and you will then be presented with an interactive prompt.

You can then type in whatever phrase, topic, or question you desire, and llama.cpp will use the model to calculate a response.  This is all based on local processing, no external URL or services are being called.

If you have the Mac Activity Monitor open during the processing of your prompts, the GPU time spent on the “main” binary will be significant.

Example prompts and replies

what language do they speak in moldova?

The official language of Moldova is Romanian, which is spoken by the vast majority of the population. However, there are also several minority languages spoken in Moldova, including:

* Russian: spoken by around 10% of the population, primarily in the breakaway region of Transnistria.
* Ukrainian: spoken by around 5% of the population, primarily in the north and west of the country.
* Gagauz: spoken by around 3% of the population, primarily in the south of the country.
* Other languages: There are also several other minority languages spoken in Moldova, including Turkish, Bulgarian, and sign language.
It's worth noting that while Romanian is the official language of Moldova, there are also many dialects and regional variations within the country, reflecting the diverse cultural heritage of the Moldovan people.

what is the best movie in the matrix franchise?

The Matrix franchise consists of four movies, and opinions on which one is the best can vary depending on personal taste. Here are some general pros and cons of each movie in the franchise:

* The Matrix (1999): This is the original movie that started the franchise and is widely considered a classic. It introduced innovative special effects, a unique storyline, and memorable characters. Pros: groundbreaking visual effects, intriguing storyline, iconic characters. Cons: some scenes may feel slow or confusing for non-fans of the genre.
* The Matrix Reloaded (2003): This movie picks up where the first one left off and continues to explore the complex world of the Matrix. Pros: action-packed, well-developed characters, expanded universe. Cons: some plot points may feel repetitive or unnecessary.
* The Matrix Revolutions (2003): The final installment of the trilogy brings a satisfying conclusion to the story and features impressive action sequences. Pros: exciting climax, emotional payoff, well-crafted ending. Cons: some plot threads may feel rushed or unresolved.
* Matrix 4 (2020): This fourth installment of the franchise takes place after the events of Revolutions and introduces new characters while exploring the next chapter in the Matrix's evolution. Pros: exciting action sequences, intriguing new characters, expanded universe. Cons: some fans may feel disconnected from the story due to the time gap between installments.

Ultimately, the best movie in the Matrix franchise is a matter of personal preference. Some fans may prefer the originality and groundbreaking visuals of the first movie, while others may enjoy the action-packed sequels that expand on the universe.

in less than 3 sentences, tell me the best movie in the matrix franchise

The best movie in the Matrix franchise is The Matrix (1999). It offers a unique and thought-provoking storyline with groundbreaking visual effects that have become iconic in popular culture. While opinions may vary, many fans consider this the standout installment of the series.

summarize the apple m1 gpu specs

The Apple M1 GPU is a custom-designed graphics processing unit (GPU) that is integrated into Apple's latest MacBooks. Here are some key specifications of the M1 GPU:

* Number of Cores: 8
* Clock Speed: Up to 2.6 GHz
* Memory Interface: 128 bits
* Memory Bandwidth: 1024 GB/s
* Texture Mipmap Size: 1024x1024 pixels
* Render Output Engine: 4
* Shader Cores: 4
* Compute Units: 8
* GPU Frequency: Up to 3.0 GHz (turbo boost)

The M1 GPU is based on Apple's custom ARM architecture and is designed specifically for use with Apple's own Silicon. This allows for better performance and power efficiency compared to using a traditional Intel or AMD GPU. The M1 GPU also supports advanced features such as ray tracing, Metal 3, and AVX-512 instructions.
It's worth noting that the exact specifications of the M1 GPU may vary depending on the specific MacBook model and generation.

 

 

REFERENCES

github, llama.cpp project

llama.cpp, M family performance

llama.cpp, A family performance

huggingface.co, Llama-2-7B-Chat-GGUF model

Rohan Chopra, how to run llama2 locally

Sandeep Sharma, GGML versus newer GGUF

Phillip Gimmi, what is GGLM and GGUF