Ollamaは、ローカルで大規模言語モデル(LLM)を実行するためのオープンソースフレームワークです。これはさまざまなオペレーティングシステムをサポートしていますが、FreeBSDには対応していません。そのため、FreeBSD上でコンパイルしてインストールすることを試みました。
結論として、公式サイトのOllamaはコンパイルに失敗しましたが、カスタムバージョンを使用すれば成功しました。このカスタムバージョンはコードを変更しており、セキュリティを考慮してFreeBSDのjail内で操作しました。
2025年5月29日追加:FreeBSD 14.2にはOllamaのpkgパッケージがあり、直接pkg installでインストール可能です。詳細は「FreeBSDでOllamaをインストールし、DeepSeek r1大モデルを体験する」をご参照ください。
FreeBSDでOllamaをインストールする(初回試行、失敗)
コンパイル環境の構築
まず最新版のGoをインストールします。
pkg install go122-1.22.5 cmake
後で確認したところ、これは動作しなかったため、デフォルトのGoをインストールしました(元々go122というコマンドで実行していました)。
pkg install go
しかし、このバージョンは古いです。
高バージョンをダウンロードして試してみます。ダウンロード: https://go.dev/dl/go1.22.5.freebsd-amd64.tar.gz
wget https://go.dev/dl/go1.22.5.freebsd-amd64.tar.gz
解凍します。
tar -xzvf go1.22.5.freebsd-amd64.tar.gz
パスを追加します。
export PATH=/home/skywalk/work/go/bin:$PATH
これでGoは1.22.5バージョンになります。
$ go version
go version go1.22.5 freebsd/amd64
Goを高速化します。
# GOPROXY環境変数を設定
export GOPROXY=https://goproxy.io,direct
# プロキシをバイパスするリポジトリを指定(オプション)
export GOPRIVATE=git.mycompany.com,github.com/my/private
Ollamaのコンパイル
公式サイトからOllamaをダウンロードします。
git clone https://github.com/ollama/ollama
生成します。
go generate ./...
ビルドします。
go build .
しかし、ここではコンパイルに失敗しました。最後にエラーが出ました。
skywalk@fbhost:~/github/ollama $ go build .
package github.com/ollama/ollama
imports github.com/ollama/ollama/cmd
imports github.com/ollama/ollama/server
imports github.com/ollama/ollama/gpu: C source files not allowed when not using cgo or SWIG: gpu_info_cudart.c gpu_info_nvcuda.c gpu_info_nvml.c gpu_info_oneapi.c
FreeBSD jail内でデバッグ(2回目の試行、失敗)
FreeBSD jailを作成し、ログインします。
# cbsd jlogin fb12
ログイン後はcshですが、慣れなければbashに変更できます。
必要なパッケージをインストールします。
# pkg install -y git go122 cmake vulkan-headers vulkan-loader
特製バージョンをダウンロードします。
# git clone --depth 1 https://github.com/prep/ollama.git
# git clone https://github.com/prep/ollama.git
git clone https://github.com/prep/ollama
ブランチを切り替える(ここでは切り替えませんでした)
# cd ollama && git checkout feature/add-bsd-support
まず加速を設定します
cshの場合
# set GO111MODULE=on
# set GOPROXY=https://goproxy.io,direct<br></br> # set GOPRIVATE=git.mycompany.com,github.com/my/private
bashの場合
# Go Modules機能を有効にします
export GO111MODULE=on
# GOPROXY環境変数を設定<br></br> export GOPROXY=https://goproxy.io,direct<br></br> # プロキシをバイパスするリポジトリを指定(オプション)<br></br> export GOPRIVATE=git.mycompany.com,github.com/my/private
go generateとbuildを開始します。
# go122 generate ./...
# go122 build .
最終的にエラーが発生しました:
go122 build .
go: downloading github.com/pdevine/tensor v0.0.0-20240228013915-64ccaa8d9ca9
convert/gemma.go:12:2: github.com/pdevine/tensor@v0.0.0-20240228013915-64ccaa8d9ca9: invalid version: unknown revision 64ccaa8d9ca9
convert/gemma.go:13:2: github.com/pdevine/tensor@v0.0.0-20240228013915-64ccaa8d9ca9: invalid version: unknown revision 64ccaa8d9ca9
FreeBSD jailで通常ユーザーでOllamaの特製バージョンをコンパイル(3回目、成功)
エラーがある場合、go.sumファイルとgo.modファイルを修正する必要があります。
以下のコマンドを使用してください:
bash
mkdir github.com
cd github.com
git clone https://github.com/prep/ollama.git
cd ollama && git checkout feature/add-bsd-support
# Go Modules機能を有効にします
export GO111MODULE=on
# GOPROXY環境変数を設定
export GOPROXY=https://goproxy.io,direct
# プロキシをバイパスするリポジトリを指定(オプション)
export GOPRIVATE=git.mycompany.com,github.com/my/private
go122 generate ./...
go122 build .
エラーのデバッグプロセス
まだエラーがありました: go122 build .
go: downloading github.com/pdevine/tensor v0.0.0-20240228013915-64ccaa8d9ca9
convert/gemma.go:12:2: github.com/pdevine/tensor@v0.0.0-20240228013915-64ccaa8d9ca9: invalid version: unknown revision 64ccaa8d9ca9
convert/gemma.go:13:2: github.com/pdevine/tensor@v0.0.0-20240228013915-64ccaa8d9ca9: invalid version: unknown revision 64ccaa8d9ca9
go.sumファイルを編集し、pdeviene/tensorを次のように変更します:
github.com/pdevine/tensor v0.0.0-20240510204454-f88f4562727c h1:GwiUUjKefgvSNmv3NCvI/BL0kDebW6Xa+kcdpdc1mTY=
github.com/pdevine/tensor v0.0.0-20240510204454-f88f4562727c/go.mod h1:PSojXDXF7TbgQiD6kkd98IHOS0QqTyUEaWRiS8+BLu8=
また、go.modファイルも編集し、pdevine/tensorのバージョンを5月10日の最新バージョンに変更します:
github.com/pdevine/tensor v0.0.0-20240510204454-f88f4562727c
その後、再びgenerateとbuildを行います。
状況に応じて、generateを再度行わずに、提示された通りにgetします:
go122 get github.com/ollama/ollama/convert
その後、再びbuildを行います。
go122 build .
完了!
テストしてみます:
./ollama help | head -n 5
./ollama help | head -n 5
Large language model runner
Usage:
ollama [flags]
ollama [command]
確かにコンパイル成功しました!
FreeBSD 14.2でpkg経由でOllamaをインストール
直接pkgインストールします。
sudo pkg install ollama
[1/1] Upgrading ollama from 0.3.6_3 to 0.3.6_4...
[1/1] Extracting ollama-0.3.6_4: 100%
ollamaを起動します。
まずollamaサービスを起動します。
./ollama serve
llama3モデルを実行します。
./ollama run llama3
ollamaは自動的にモデルをダウンロードします。モデルがダウンロードされると、インタラクティブなインターフェースに移ります。
小さなモデルを使ってテストすることもできます:超小型LLMであるsmollm:135m-CSDNブログ
ollamaのインタラクティブな出力
1つの答えに50分かかりましたが、少なくともFreeBSDで実行できました!
[skywalk@fb12 ~/gihub.com/ollama]$ ./ollama run llama3
[GIN] 2024/07/15 - 12:01:47 | 200 | 466.704µs | 10.0.0.12 | HEAD "/"
[GIN] 2024/07/15 - 12:01:47 | 404 | 450.54µs | 10.0.0.12 | POST "/api/show"
pulling manifest ⠦ time=2024-07-15T12:01:50.016+08:00 level=INFO source=download.go:136 msg="downloading 6a0746a1ec1a in 47 100 MB part(s)"
pulling manifest
pulling 6a0746a1ec1a... 100% ▕████████████████▏ 4.7 GB time=2024-07-15T12:20:25.740+08:00 level=INFO source=download.go:136 msg="downloapulling manifest
pulling 6a0746a1ec1a... 100% ▕████████████████▏ 4.7 GB
pulling 4fa551d4f938... 100% ▕████████████████▏ 12 KB tpulling manifest
pulling 6a0746a1ec1a... 100% ▕████████████████▏ 4.7 GB
pulling 4fa551d4f938... 100% ▕████████████████▏ 12 KB
pulling manifest
pulling 6a0746a1ec1a... 100% ▕████████████████▏ 4.7 GB
pulling 4fa551d4f938... 100% ▕████████████████▏ 12 KB
pulling manifest
pulling 6a0746a1ec1a... 100% ▕████████████████▏ 4.7 GB
pulling manifest
pulling 6a0746a1ec1a... 100% ▕████████████████▏ 4.7 GB
pulling 4fa551d4f938... 100% ▕████████████████▏ 12 KB
pulling 8ab4849b038c... 100% ▕████████████████▏ 254 B
pulling 577073ffcc6c... 100% ▕████████████████▏ 110 B
pulling 3f8eb4da87fa... 100% ▕████████████████▏ 485 B
verifying sha256 digest
writing manifest
removing any unused layers
success
[GIN] 2024/07/15 - 12:22:06 | 200 | 1.786897ms | 10.0.0.12 | POST "/api/show"
[GIN] 2024/07/15 - 12:22:06 | 200 | 1.384117ms | 10.0.0.12 | POST "/api/show"
time=2024-07-15T12:22:06.288+08:00 level=INFO source=cpu_common.go:18 msg="CPU does not have vector extensions"
⠴ time=2024-07-15T12:22:20.820+08:00 level=INFO source=cpu_common.go:18 msg="CPU does not have vector extensions"
time=2024-07-15T12:22:20.821+08:00 level=INFO source=server.go:289 msg="starting llama server" cmd="/tmp/ollama1084183988/runners/cpu/ollama_llama_server --model /home/skywalk/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa --ctx-size 2048 --batch-size 512 --embedding --log-disable --parallel 1 --port 62268"
time=2024-07-15T12:22:20.847+08:00 level=INFO source=sched.go:340 msg="loaded runners" count=1
time=2024-07-15T12:22:20.847+08:00 level=INFO source=server.go:432 msg="waiting for llama runner to start responding"
{"function":"server_params_parse","level":"INFO","line":2604,"msg":"logging to file is disabled.","tid":"0x10139f812000","timestamp":1721017340}
⠦ {"build":2770,"commit":"952d03db","function":"main","level":"INFO","line":2821,"msg":"build info","tid":"0x10139f812000","timestamp":1721017340}
{"function":"main","level":"INFO","line":2828,"msg":"system info","n_threads":4,"n_threads_batch":-1,"system_info":"AVX = 0 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | ","tid":"0x10139f812000","timestamp":1721017340,"total_threads":4}
⠧ llama_model_loader: loaded meta data with 22 key-value pairs and 291 tensors from /home/skywalk/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.name str = Meta-Llama-3-8B-Instruct
llama_model_loader: - kv 2: llama.block_count u32 = 32
llama_model_loader: - kv 3: llama.context_length u32 = 8192
llama_model_loader: - kv 4: llama.embedding_length u32 = 4096
llama_model_loader: - kv 5: llama.feed_forward_length u32 = 14336
llama_model_loader: - kv 6: llama.attention.head_count u32 = 32
llama_model_loader: - kv 7: llama.attention.head_count_kv u32 = 8
llama_model_loader: - kv 8: llama.rope.freq_base f32 = 500000.000000
llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 10: general.file_type u32 = 2
llama_model_loader: - kv 11: llama.vocab_size u32 = 128256
llama_model_loader: - kv 12: llama.rope.dimension_count u32 = 128
llama_model_loader: - kv 13: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 14: tokenizer.ggml.pre str = llama-bpe
⠇ llama_model_loader: - kv 15: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 16: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
⠙ llama_model_loader: - kv 17: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv 18: tokenizer.ggml.bos_token_id u32 = 128000
llama_model_loader: - kv 19: tokenizer.ggml.eos_token_id u32 = 128009
llama_model_loader: - kv 20: tokenizer.chat_template str = {% set loop_messages = messages %}{% ...
llama_model_loader: - kv 21: general.quantization_version u32 = 2
llama_model_loader: - type f32: 65 tensors
llama_model_loader: - type q4_0: 225 tensors
llama_model_loader: - type q6_K: 1 tensors
⠹ llm_load_vocab: special tokens definition check successful ( 256/128256 ).
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = BPE
llm_load_print_meta: n_vocab = 128256
llm_load_print_meta: n_merges = 280147
llm_load_print_meta: n_ctx_train = 8192
llm_load_print_meta: n_embd = 4096
llm_load_print_meta: n_head = 32
llm_load_print_meta: n_head_kv = 8
llm_load_print_meta: n_layer = 32
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_embd_head_k = 128
llm_load_print_meta: n_embd_head_v = 128
llm_load_print_meta: n_gqa = 4
llm_load_print_meta: n_embd_k_gqa = 1024
llm_load_print_meta: n_embd_v_gqa = 1024
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale = 0.0e+00
llm_load_print_meta: n_ff = 14336
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: causal attn = 1
llm_load_print_meta: pooling type = 0
llm_load_print_meta: rope type = 0
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 500000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx = 8192
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: ssm_d_conv = 0
llm_load_print_meta: ssm_d_inner = 0
llm_load_print_meta: ssm_d_state = 0
llm_load_print_meta: ssm_dt_rank = 0
llm_load_print_meta: model type = 8B
llm_load_print_meta: model ftype = Q4_0
llm_load_print_meta: model params = 8.03 B
llm_load_print_meta: model size = 4.33 GiB (4.64 BPW)
llm_load_print_meta: general.name = Meta-Llama-3-8B-Instruct
llm_load_print_meta: BOS token = 128000 ''
llm_load_print_meta: EOS token = 128009 ''
llm_load_print_meta: LF token = 128 'Ä'
llm_load_print_meta: EOT token = 128009 ''
llm_load_tensors: ggml ctx size = 0.15 MiB
llm_load_tensors: CPU buffer size = 4437.80 MiB
.......................................................................................
⠸ llama_new_context_with_model: n_ctx = 2048
llama_new_context_with_model: n_batch = 512
llama_new_context_with_model: n_ubatch = 512
llama_new_context_with_model: freq_base = 500000.0
llama_new_context_with_model: freq_scale = 1
⠦ llama_kv_cache_init: CPU KV buffer size = 256.00 MiB
llama_new_context_with_model: KV self size = 256.00 MiB, K (f16): 128.00 MiB, V (f16): 128.00 MiB
llama_new_context_with_model: CPU output buffer size = 0.50 MiB
llama_new_context_with_model: CPU compute buffer size = 258.50 MiB
llama_new_context_with_model: graph nodes = 1030
llama_new_context_with_model: graph splits = 1
⠧ {"function":"initialize","level":"INFO","line":448,"msg":"initializing slots","n_slots":1,"tid":"0x10139f812000","timestamp":1721017395}
{"function":"initialize","level":"INFO","line":460,"msg":"new slot","n_ctx_slot":2048,"slot_id":0,"tid":"0x10139f812000","timestamp":1721017395}
{"function":"main","level":"INFO","line":3065,"msg":"model loaded","tid":"0x10139f812000","timestamp":1721017395}
{"function":"main","hostname":"127.0.0.1","level":"INFO","line":3268,"msg":"HTTP server listening","n_threads_http":"3","port":"62268","tid":"0x10139f812000","timestamp":1721017395}
{"function":"update_slots","level":"INFO","line":1579,"msg":"all slots are idle and system prompt is empty, clear the KV cache","tid":"0x10139f812000","timestamp":1721017395}
{"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":0,"tid":"0x10139f812000","timestamp":1721017395}
{"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":1,"tid":"0x10139f812000","timestamp":1721017395}
{"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":37211,"status":200,"tid":"0x1013dbe0ae00","timestamp":1721017395}
{"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":2,"tid":"0x10139f812000","timestamp":1721017395}
{"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":60236,"status":200,"tid":"0x1013dbe0a700","timestamp":1721017395}
{"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":3,"tid":"0x10139f812000","timestamp":1721017395}
{"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":43135,"status":200,"tid":"0x1013dbe0a000","timestamp":1721017395}
{"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":4,"tid":"0x10139f812000","timestamp":1721017395}
{"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":31620,"status":200,"tid":"0x1013dbe0ae00","timestamp":1721017395}
{"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":5,"tid":"0x10139f812000","timestamp":1721017395}
{"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":56527,"status":200,"tid":"0x1013dbe0a700","timestamp":1721017395}
{"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":6,"tid":"0x10139f812000","timestamp":1721017395}
{"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":53213,"status":200,"tid":"0x1013dbe0a000","timestamp":1721017395}
{"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":7,"tid":"0x10139f812000","timestamp":1721017395}
{"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":21875,"status":200,"tid":"0x1013dbe0ae00","timestamp":1721017395}
{"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":47567,"status":200,"tid":"0x1013dbe0a700","timestamp":1721017395}
{"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":8,"tid":"0x10139f812000","timestamp":1721017395}
{"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":56264,"status":200,"tid":"0x1013dbe0a700","timestamp":1721017395}
⠇ [GIN] 2024/07/15 - 12:23:15 | 200 | 1m8s | 10.0.0.12 | POST "/api/chat"
>>> hello
time=2024-07-15T14:22:47.710+08:00 level=INFO source=cpu_common.go:18 msg="CPU does not have vector extensions"
⠋ time=2024-07-15T14:23:02.785+08:00 level=INFO source=cpu_common.go:18 msg="CPU does not have vector extensions"
⠙ time=2024-07-15T14:23:02.789+08:00 level=INFO source=server.go:289 msg="starting llama server" cmd="/tmp/ollama1084183988/runners/cpu/ollama_llama_server --model /home/skywalk/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa --ctx-size 2048 --batch-size 512 --embedding --log-disable --parallel 1 --port 61604"
time=2024-07-15T14:23:02.811+08:00 level=INFO source=sched.go:340 msg="loaded runners" count=1
time=2024-07-15T14:23:02.812+08:00 level=INFO source=server.go:432 msg="waiting for llama runner to start responding"
{"function":"server_params_parse","level":"INFO","line":2604,"msg":"logging to file is disabled.","tid":"0x20da49412000","timestamp":1721024582}
{"build":2770,"commit":"952d03db","function":"main","level":"INFO","line":2821,"msg":"build info","tid":"0x20da49412000","timestamp":1721024582}
{"function":"main","level":"INFO","line":2828,"msg":"system info","n_threads":4,"n_threads_batch":-1,"system_info":"AVX = 0 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | ","tid":"0x20da49412000","timestamp":1721024582,"total_threads":4}
⠸ llama_model_loader: loaded meta data with 22 key-value pairs and 291 tensors from /home/skywalk/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.name str = Meta-Llama-3-8B-Instruct
llama_model_loader: - kv 2: llama.block_count u32 = 32
llama_model_loader: - kv 3: llama.context_length u32 = 8192
llama_model_loader: - kv 4: llama.embedding_length u32 = 4096
llama_model_loader: - kv 5: llama.feed_forward_length u32 = 14336
llama_model_loader: - kv 6: llama.attention.head_count u32 = 32
llama_model_loader: - kv 7: llama.attention.head_count_kv u32 = 8
llama_model_loader: - kv 8: llama.rope.freq_base f32 = 500000.000000
llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 10: general.file_type u32 = 2
llama_model_loader: - kv 11: llama.vocab_size u32 = 128256
llama_model_loader: - kv 12: llama.rope.dimension_count u32 = 128
llama_model_loader: - kv 13: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 14: tokenizer.ggml.pre str = llama-bpe
⠼ llama_model_loader: - kv 15: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 16: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
⠧ llama_model_loader: - kv 17: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv 18: tokenizer.ggml.bos_token_id u32 = 128000
llama_model_loader: - kv 19: tokenizer.ggml.eos_token_id u32 = 128009
llama_model_loader: - kv 20: tokenizer.chat_template str = {% set loop_messages = messages %}{% ...
llama_model_loader: - kv 21: general.quantization_version u32 = 2
llama_model_loader: - type f32: 65 tensors
llama_model_loader: - type q4_0: 225 tensors
llama_model_loader: - type q6_K: 1 tensors
⠇ llm_load_vocab: special tokens definition check successful ( 256/128256 ).
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = BPE
llm_load_print_meta: n_vocab = 128256
llm_load_print_meta: n_merges = 280147
llm_load_print_meta: n_ctx_train = 8192
llm_load_print_meta: n_embd = 4096
llm_load_print_meta: n_head = 32
llm_load_print_meta: n_head_kv = 8
llm_load_print_meta: n_layer = 32
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_embd_head_k = 128
llm_load_print_meta: n_embd_head_v = 128
llm_load_print_meta: n_gqa = 4
llm_load_print_meta: n_embd_k_gqa = 1024
llm_load_print_meta: n_embd_v_gqa = 1024
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale = 0.0e+00
llm_load_print_meta: n_ff = 14336
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: causal attn = 1
llm_load_print_meta: pooling type = 0
llm_load_print_meta: rope type = 0
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 500000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx = 8192
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: ssm_d_conv = 0
llm_load_print_meta: ssm_d_inner = 0
llm_load_print_meta: ssm_d_state = 0
llm_load_print_meta: ssm_dt_rank = 0
llm_load_print_meta: model type = 8B
llm_load_print_meta: model ftype = Q4_0
llm_load_print_meta: model params = 8.03 B
llm_load_print_meta: model size = 4.33 GiB (4.64 BPW)
llm_load_print_meta: general.name = Meta-Llama-3-8B-Instruct
llm_load_print_meta: BOS token = 128000 ''
llm_load_print_meta: EOS token = 128009 ''
llm_load_print_meta: LF token = 128 'Ä'
llm_load_print_meta: EOT token = 128009 ''
llm_load_tensors: ggml ctx size = 0.15 MiB
⠙ llm_load_tensors: CPU buffer size = 4437.80 MiB
.......................................................................................
llama_new_context_with_model: n_ctx = 2048
llama_new_context_with_model: n_batch = 512
llama_new_context_with_model: n_ubatch = 512
llama_new_context_with_model: freq_base = 500000.0
llama_new_context_with_model: freq_scale = 1
⠴ llama_kv_cache_init: CPU KV buffer size = 256.00 MiB
llama_new_context_with_model: KV self size = 256.00 MiB, K (f16): 128.00 MiB, V (f16): 128.00 MiB
llama_new_context_with_model: CPU output buffer size = 0.50 MiB
llama_new_context_with_model: CPU compute buffer size = 258.50 MiB
llama_new_context_with_model: graph nodes = 1030
llama_new_context_with_model: graph splits = 1
⠦ {"function":"initialize","level":"INFO","line":448,"msg":"initializing slots","n_slots":1,"tid":"0x20da49412000","timestamp":1721024651}
{"function":"initialize","level":"INFO","line":460,"msg":"new slot","n_ctx_slot":2048,"slot_id":0,"tid":"0x20da49412000","timestamp":1721024651}
{"function":"main","level":"INFO","line":3065,"msg":"model loaded","tid":"0x20da49412000","timestamp":1721024651}
{"function":"main","hostname":"127.0.0.1","level":"INFO","line":3268,"msg":"HTTP server listening","n_threads_http":"3","port":"61604","tid":"0x20da49412000","timestamp":1721024651}
{"function":"update_slots","level":"INFO","line":1579,"msg":"all slots are idle and system prompt is empty, clear the KV cache","tid":"0x20da49412000","timestamp":1721024651}
{"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":0,"tid":"0x20da49412000","timestamp":1721024651}
{"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":1,"tid":"0x20da49412000","timestamp":1721024651}
{"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":48229,"status":200,"tid":"0x20da85a0a000","timestamp":1721024651}
{"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":2,"tid":"0x20da49412000","timestamp":1721024651}
{"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":33319,"status":200,"tid":"0x20da85a0a700","timestamp":1721024651}
{"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":3,"tid":"0x20da49412000","timestamp":1721024651}
{"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":54187,"status":200,"tid":"0x20da85a0ae00","timestamp":1721024651}
{"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":4,"tid":"0x20da49412000","timestamp":1721024651}
{"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":28162,"status":200,"tid":"0x20da85a0a000","timestamp":1721024651}
{"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":33773,"status":200,"tid":"0x20da85a0a700","timestamp":1721024651}
{"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":5,"tid":"0x20da49412000","timestamp":1721024651}
{"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":6,"tid":"0x20da49412000","timestamp":1721024651}
{"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":19633,"status":200,"tid":"0x20da85a0ae00","timestamp":1721024651}
{"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":7,"tid":"0x20da49412000","timestamp":1721024651}
{"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":35779,"status":200,"tid":"0x20da85a0a000","timestamp":1721024651}
{"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":18413,"status":200,"tid":"0x20da85a0a700","timestamp":1721024651}
⠧ {"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":8,"tid":"0x20da49412000","timestamp":1721024651}
{"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":36742,"status":200,"tid":"0x20da85a0a700","timestamp":1721024651}
{"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":9,"tid":"0x20da49412000","timestamp":1721024651}
{"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":36742,"status":200,"tid":"0x20da85a0a700","timestamp":1721024651}
⠇ {"function":"log_server_request","level":"INFO","line":2742,"method":"POST","msg":"request","params":{},"path":"/tokenize","remote_addr":"10.0.0.12","remote_port":36742,"status":200,"tid":"0x20da85a0a700","timestamp":1721024651}
{"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":10,"tid":"0x20da49412000","timestamp":1721024651}
{"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":36742,"status":200,"tid":"0x20da85a0a700","timestamp":1721024651}
⠏ {"function":"launch_slot_with_data","level":"INFO","line":833,"msg":"slot is processing task","slot_id":0,"task_id":11,"tid":"0x20da49412000","timestamp":1721024651}
{"function":"update_slots","ga_i":0,"level":"INFO","line":1817,"msg":"slot progression","n_past":0,"n_past_se":0,"n_prompt_tokens_processed":10,"slot_id":0,"task_id":11,"tid":"0x20da49412000","timestamp":1721024651}
{"function":"update_slots","level":"INFO","line":1841,"msg":"kv cache rm [p0, end)","p0":0,"slot_id":0,"task_id":11,"tid":"0x20da49412000","timestamp":1721024651}
Hello! It's nice to meet you. Is there something I can help you with, or
would you like to chat?{"function":"print_timings","level":"INFO","line":276,"msg":"prompt eval time = 106459.91 ms / 10 tokens (10645.99 ms per token, 0.09 tokens per second)","n_prompt_tokens_processed":10,"n_tokens_second":0.09393207617523164,"slot_id":0,"t_prompt_processing":106459.906,"t_token":10645.990600000001,"task_id":11,"tid":"0x20da49412000","timestamp":1721027627}
{"function":"print_timings","level":"INFO","line":290,"msg":"generation eval time = 2868918.63 ms / 26 runs (110343.02 ms per token, 0.01 tokens per second)","n_decoded":26,"n_tokens_second":0.00906264811318913,"slot_id":0,"t_token":110343.0241923077,"t_token_generation":2868918.629,"task_id":11,"tid":"0x20da49412000","timestamp":1721027627}
{"function":"print_timings","level":"INFO","line":299,"msg":" total time = 2975378.54 ms","slot_id":0,"t_prompt_processing":106459.906,"t_token_generation":2868918.629,"t_total":2975378.535,"task_id":11,"tid":"0x20da49412000","timestamp":1721027627}
{"function":"update_slots","level":"INFO","line":1649,"msg":"slot released","n_cache_tokens":36,"n_ctx":2048,"n_past":35,"n_system_tokens":0,"slot_id":0,"task_id":11,"tid":"0x20da49412000","timestamp":1721027627,"truncated":false}
{"function":"log_server_request","level":"INFO","line":2742,"method":"POST","msg":"request","params":{},"path":"/completion","remote_addr":"10.0.0.12","remote_port":36742,"status":200,"tid":"0x20da85a0a700","timestamp":1721027627}
[GIN] 2024/07/15 - 15:13:47 | 200 | 50m59s | 10.0.0.12 | POST "/api/chat"