It is relatively easy to experiment with a base LLama2 model on Ubuntu, thanks to llama.cpp written by Georgi Gerganov.

The llama.cpp project provides a C++ implementation for running LLama2 models, and works even on systems with only a CPU (although performance would be significantly enhanced if using a CUDA-capable GPU).

Download model and run

The synchhq team wrote a wrapper script to make it more convenient to:

Download the open-source LLama2 model from Tom Jobbins (TheBloke) at huggingface.co
Compile llama.cpp
Run the llama binary ‘main’ which provides an interactive prompt

We are going to use the project described here, but do need to apply a patch on top to use the newer GGUF file format which is compatible with llama.cpp.

# download wrapper script project
git clone https://github.com/sychhq/llama-cpp-setup.git
cd llama-cpp-setup

# apply necessary patch to use newer GGUF file format
git pull --no-edit --no-ff https://github.com/ctwardy/llama-cpp-setup.git patch-1
# fix flag syntax in patch
sed -i 's/ -n-gpu-layers/ --n-gpu-layers/' setup.sh

# press <ENTER> twice to accept defaults for model file and filename
# wait a few minutes for compilation and then model download
./setup.sh

Interactive Prompt

After the compilation phase, the model file will be downloaded which can take a few minutes. If all goes well, the binary and will be run and you will then be presented with an interactive prompt.

You can then type in whatever phrase, topic, or question you desire, and llama.cpp will use the model to calculate a response. This is all based on local processing, no external URL or services are being called.

Example prompts and replies

what language do they speak in moldova?

The official language of Moldova is Romanian, which is spoken by the vast majority of the population. However, there are also several other languages spoken in Moldova, including Russian, Ukrainian, and Gagauz.

what is the best movie in the matrix franchise?

The best movie in the Matrix franchise is a matter of personal opinion, as each film has its own unique strengths and weaknesses. However, here are some general rankings based on critical and commercial success:

1. The Matrix (1999) - This is the original and most highly regarded film in the franchise, known for its innovative special effects, intricate storyline, and thought-provoking themes. It has a 87% approval rating on Rotten Tomatoes and was nominated for four Academy Awards.
2. The Matrix Reloaded (2003) - While not as universally acclaimed as the first film, The Matrix Reloaded has its own merits, including impressive action sequences and a deeper exploration of the Matrix's mythology. It has a 74% approval rating on Rotten Tomatoes.
3. The Matrix Revolutions (2003) - The final installment of the trilogy, The Matrix Revolutions, completes the story arc of the franchise and offers a satisfying conclusion to the series. While it may not be as groundbreaking as the first two films, it has an 81% approval rating on Rotten Tomatoes.
Of course, opinions on the best movie in the Matrix franchise can vary depending on individual tastes and preferences. Some fans may prefer the more action-oriented approach of Reloaded and Revolutions, while others may appreciate the more cerebral and philosophical themes explored in the first film. Ultimately, the best movie in the Matrix franchise is a matter of personal preference.

in less than 3 sentences, tell me the best movie in the matrix franchise

The best movie in the Matrix franchise is The Matrix (1999), which is highly regarded for its innovative special effects, intricate storyline, and thought-provoking themes. It has a 87% approval rating on Rotten Tomatoes and was nominated for four Academy Awards. While opinions may vary, many fans consider it the standout film in the trilogy.

summarize the apple m1 gpu specs

The Apple M1 GPU, which is integrated into the Apple Silicon processors used in Mac computers, has several key specifications that set it apart from other mobile GPUs. Here are some of the notable specs for the Apple M1 GPU:

* Number of Cores: 8
* Clock Speed: Up to 3.2 GHz
* Memory Interface: 128-bit
* Memory Bandwidth: 1024 GB/s
* Texture Addressing: 1024-bit
* Render Output Engine (ROE): 4
* Compute Units: 768
* Floating Point Operations Per Second (FLOPS): 1.5 TFLOPS
* Integer Operations Per Second (INT OPS): 3.2 TOPS
* Memory Burst Size: 128 KB
* Memory Bandwidth per Core: 64 GB/s
* GPU Frequency: Up to 3.2 GHz

These specs indicate that the Apple M1 GPU is a high-performance GPU that is capable of handling demanding graphics and compute tasks, while also providing efficient power management and low heat dissipation.

REFERENCES

github, llama.cpp project

huggingface.co, Llama-2-7B-Chat-GGUF model

Rohan Chopra, how to run llama2 locally

Sandeep Sharma, GGML versus newer GGUF

Phillip Gimmi, what is GGLM and GGUF

Intel whitepaper, optimizing and running Llama2 on Intel using oneAPI toolkit

Ryan Stewart, enabling CUDA-capable GPU for faster processing in llama.ccp

nvidia.com, getting better outputs from your LLM, param explanations