Fabian Lee : Software Engineer

Mac: LLama2 model on Apple Silicon and GPU using llama.cpp

January 6, 2024
Categories: Mac

It is relatively easy to experiment with a base LLama2 model on M family Apple Silicon, thanks to llama.cpp written by Georgi Gerganov. The llama.cpp project provides a C++ implementation for running LLama2 models, and takes advantage of the Apple integrated GPU to offer a performant experience (see M family performance specs).

GCP: deploying a Python WSGI Gunicorn app on Cloud Run

April 27, 2023
Categories: Development, Hyperscaler, Python

Flask is a suitable web server during development, but if you are going to deploy in a production environment, a Python WSGI server such as Gunicorn should be used. This also applies to Python Flask apps deployed to GCP Cloud Run. Gunicorn is necessary to tune the worker and thread count of each instance to … GCP: deploying a Python WSGI Gunicorn app on Cloud Run

ELK: Scaling an ElasticSearch Cluster

November 28, 2016
Categories: DevOps

The heart of the ELK stack is Elasticsearch. In order to provide high availability and scalability, it needs to be deployed as a cluster with master and data nodes. The Elasticsearch cluster is responsible for both indexing incoming data as well as searches against that indexed data. Resources As described in the documentation, if there … ELK: Scaling an ElasticSearch Cluster

Ubuntu: Using a swap file instead of swap partition for virtualized server VMs

July 18, 2016
Categories: Linux

Before virtualization, there was a stronger argument for using a swap partition instead of a swap file for servers. A fragmented swap file could lead to performance issues that a statically sized and placed partition did not have consider. But once virtualization comes into play, unless you go to great lengths to segment your storage … Ubuntu: Using a swap file instead of swap partition for virtualized server VMs