FreeBSDでOllamaをインストールする方法

Ollamaは、ローカルで大規模言語モデル(LLM)を実行するためのオープンソースフレームワークです。これはさまざまなオペレーティングシステムをサポートしていますが、FreeBSDには対応していません。そのため、FreeBSD上でコンパイルしてインストールすることを試みました。

結論として、公式サイトのOllamaはコンパイルに失敗しましたが、カスタムバージョンを使用すれば成功しました。このカスタムバージョンはコードを変更しており、セキュリティを考慮してFreeBSDのjail内で操作しました。

2025年5月29日追加:FreeBSD 14.2にはOllamaのpkgパッケージがあり、直接pkg installでインストール可能です。詳細は「FreeBSDでOllamaをインストールし、DeepSeek r1大モデルを体験する」をご参照ください。

FreeBSDでOllamaをインストールする(初回試行、失敗)

コンパイル環境の構築

まず最新版のGoをインストールします。

pkg install go122-1.22.5 cmake

後で確認したところ、これは動作しなかったため、デフォルトのGoをインストールしました(元々go122というコマンドで実行していました)。

pkg install go

しかし、このバージョンは古いです。

高バージョンをダウンロードして試してみます。ダウンロード: https://go.dev/dl/go1.22.5.freebsd-amd64.tar.gz

wget https://go.dev/dl/go1.22.5.freebsd-amd64.tar.gz

解凍します。

tar -xzvf go1.22.5.freebsd-amd64.tar.gz

パスを追加します。

export PATH=/home/skywalk/work/go/bin:$PATH

これでGoは1.22.5バージョンになります。

$ go version
go version go1.22.5 freebsd/amd64

Goを高速化します。

# GOPROXY環境変数を設定
export GOPROXY=https://goproxy.io,direct
# プロキシをバイパスするリポジトリを指定(オプション)
export GOPRIVATE=git.mycompany.com,github.com/my/private

Ollamaのコンパイル

公式サイトからOllamaをダウンロードします。

git clone https://github.com/ollama/ollama

生成します。

go generate ./...

ビルドします。

go build . 

しかし、ここではコンパイルに失敗しました。最後にエラーが出ました。

skywalk@fbhost:~/github/ollama $ go build .
package github.com/ollama/ollama
	imports github.com/ollama/ollama/cmd
	imports github.com/ollama/ollama/server
	imports github.com/ollama/ollama/gpu: C source files not allowed when not using cgo or SWIG: gpu_info_cudart.c gpu_info_nvcuda.c gpu_info_nvml.c gpu_info_oneapi.c

FreeBSD jail内でデバッグ(2回目の試行、失敗)

FreeBSD jailを作成し、ログインします。

# cbsd jlogin fb12

ログイン後はcshですが、慣れなければbashに変更できます。

必要なパッケージをインストールします。

# pkg install -y git go122 cmake vulkan-headers vulkan-loader

特製バージョンをダウンロードします。

# git clone --depth 1 https://github.com/prep/ollama.git

# git clone https://github.com/prep/ollama.git

git clone https://github.com/prep/ollama

ブランチを切り替える(ここでは切り替えませんでした)

# cd ollama && git checkout feature/add-bsd-support

まず加速を設定します

cshの場合

# set GO111MODULE=on

# set GOPROXY=https://goproxy.io,direct<br></br> # set GOPRIVATE=git.mycompany.com,github.com/my/private 

bashの場合

# Go Modules機能を有効にします

export GO111MODULE=on

# GOPROXY環境変数を設定<br></br> export GOPROXY=https://goproxy.io,direct<br></br> # プロキシをバイパスするリポジトリを指定(オプション)<br></br> export GOPRIVATE=git.mycompany.com,github.com/my/private

go generateとbuildを開始します。

# go122 generate ./...

# go122 build .

最終的にエラーが発生しました:

go122 build .
go: downloading github.com/pdevine/tensor v0.0.0-20240228013915-64ccaa8d9ca9
convert/gemma.go:12:2: github.com/pdevine/tensor@v0.0.0-20240228013915-64ccaa8d9ca9: invalid version: unknown revision 64ccaa8d9ca9
convert/gemma.go:13:2: github.com/pdevine/tensor@v0.0.0-20240228013915-64ccaa8d9ca9: invalid version: unknown revision 64ccaa8d9ca9

FreeBSD jailで通常ユーザーでOllamaの特製バージョンをコンパイル(3回目、成功)

エラーがある場合、go.sumファイルとgo.modファイルを修正する必要があります。

以下のコマンドを使用してください:

bash
mkdir github.com
cd github.com

git clone https://github.com/prep/ollama.git

cd ollama && git checkout feature/add-bsd-support

# Go Modules機能を有効にします

export GO111MODULE=on

# GOPROXY環境変数を設定
export GOPROXY=https://goproxy.io,direct
# プロキシをバイパスするリポジトリを指定(オプション)
export GOPRIVATE=git.mycompany.com,github.com/my/private

go122 generate ./...

go122 build .

エラーのデバッグプロセス

まだエラーがありました: go122 build .
go: downloading github.com/pdevine/tensor v0.0.0-20240228013915-64ccaa8d9ca9
convert/gemma.go:12:2: github.com/pdevine/tensor@v0.0.0-20240228013915-64ccaa8d9ca9: invalid version: unknown revision 64ccaa8d9ca9
convert/gemma.go:13:2: github.com/pdevine/tensor@v0.0.0-20240228013915-64ccaa8d9ca9: invalid version: unknown revision 64ccaa8d9ca9

go.sumファイルを編集し、pdeviene/tensorを次のように変更します:

github.com/pdevine/tensor v0.0.0-20240510204454-f88f4562727c h1:GwiUUjKefgvSNmv3NCvI/BL0kDebW6Xa+kcdpdc1mTY=
github.com/pdevine/tensor v0.0.0-20240510204454-f88f4562727c/go.mod h1:PSojXDXF7TbgQiD6kkd98IHOS0QqTyUEaWRiS8+BLu8=

また、go.modファイルも編集し、pdevine/tensorのバージョンを5月10日の最新バージョンに変更します:

github.com/pdevine/tensor v0.0.0-20240510204454-f88f4562727c

その後、再びgenerateとbuildを行います。

状況に応じて、generateを再度行わずに、提示された通りにgetします:

go122 get github.com/ollama/ollama/convert

その後、再びbuildを行います。

go122 build .

完了!

テストしてみます:

./ollama help | head -n 5

./ollama help | head -n 5
Large language model runner

Usage:
ollama [flags]
ollama [command]
確かにコンパイル成功しました!

FreeBSD 14.2でpkg経由でOllamaをインストール

直接pkgインストールします。

sudo pkg install ollama

[1/1] Upgrading ollama from 0.3.6_3 to 0.3.6_4...
[1/1] Extracting ollama-0.3.6_4: 100%

ollamaを起動します。

まずollamaサービスを起動します。

./ollama serve

llama3モデルを実行します。

./ollama run llama3

ollamaは自動的にモデルをダウンロードします。モデルがダウンロードされると、インタラクティブなインターフェースに移ります。

小さなモデルを使ってテストすることもできます:超小型LLMであるsmollm:135m-CSDNブログ

ollamaのインタラクティブな出力

1つの答えに50分かかりましたが、少なくともFreeBSDで実行できました!

[skywalk@fb12 ~/gihub.com/ollama]$ ./ollama run llama3
[GIN] 2024/07/15 - 12:01:47 | 200 |     466.704µs |       10.0.0.12 | HEAD     "/"
[GIN] 2024/07/15 - 12:01:47 | 404 |      450.54µs |       10.0.0.12 | POST     "/api/show"
pulling manifest ⠦ time=2024-07-15T12:01:50.016+08:00 level=INFO source=download.go:136 msg="downloading 6a0746a1ec1a in 47 100 MB part(s)"
pulling manifest 
pulling 6a0746a1ec1a... 100% ▕████████████████▏ 4.7 GB                         time=2024-07-15T12:20:25.740+08:00 level=INFO source=download.go:136 msg="downloapulling manifest 
pulling 6a0746a1ec1a... 100% ▕████████████████▏ 4.7 GB                         
pulling 4fa551d4f938... 100% ▕████████████████▏  12 KB                         tpulling manifest 
pulling 6a0746a1ec1a... 100% ▕████████████████▏ 4.7 GB                         
pulling 4fa551d4f938... 100% ▕████████████████▏  12 KB                         
pulling manifest 
pulling 6a0746a1ec1a... 100% ▕████████████████▏ 4.7 GB                         
pulling 4fa551d4f938... 100% ▕████████████████▏  12 KB                         
pulling manifest 
pulling 6a0746a1ec1a... 100% ▕████████████████▏ 4.7 GB                         
pulling manifest 
pulling 6a0746a1ec1a... 100% ▕████████████████▏ 4.7 GB                         
pulling 4fa551d4f938... 100% ▕████████████████▏  12 KB                         
pulling 8ab4849b038c... 100% ▕████████████████▏  254 B                         
pulling 577073ffcc6c... 100% ▕████████████████▏  110 B                         
pulling 3f8eb4da87fa... 100% ▕████████████████▏  485 B                         
verifying sha256 digest 
writing manifest 
removing any unused layers 
success 
[GIN] 2024/07/15 - 12:22:06 | 200 |    1.786897ms |       10.0.0.12 | POST     "/api/show"
[GIN] 2024/07/15 - 12:22:06 | 200 |    1.384117ms |       10.0.0.12 | POST     "/api/show"
time=2024-07-15T12:22:06.288+08:00 level=INFO source=cpu_common.go:18 msg="CPU does not have vector extensions"
⠴ time=2024-07-15T12:22:20.820+08:00 level=INFO source=cpu_common.go:18 msg="CPU does not have vector extensions"
time=2024-07-15T12:22:20.821+08:00 level=INFO source=server.go:289 msg="starting llama server" cmd="/tmp/ollama1084183988/runners/cpu/ollama_llama_server --model /home/skywalk/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa --ctx-size 2048 --batch-size 512 --embedding --log-disable --parallel 1 --port 62268"
time=2024-07-15T12:22:20.847+08:00 level=INFO source=sched.go:340 msg="loaded runners" count=1
time=2024-07-15T12:22:20.847+08:00 level=INFO source=server.go:432 msg="waiting for llama runner to start responding"
{"function":"server_params_parse","level":"INFO","line":2604,"msg":"logging to file is disabled.","tid":"0x10139f812000","timestamp":1721017340}
⠦ {"build":2770,"commit":"952d03db","function":"main","level":"INFO","line":2821,"msg":"build info","tid":"0x10139f812000","timestamp":1721017340}
{"function":"main","level":"INFO","line":2828,"msg":"system info","n_threads":4,"n_threads_batch":-1,"system_info":"AVX = 0 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | ","tid":"0x10139f812000","timestamp":1721017340,"total_threads":4}
⠧ llama_model_loader: loaded meta data with 22 key-value pairs and 291 tensors from /home/skywalk/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = Meta-Llama-3-8B-Instruct
llama_model_loader: - kv   2:                          llama.block_count u32              = 32
llama_model_loader: - kv   3:                       llama.context_length u32              = 8192
llama_model_loader: - kv   4:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 14336
llama_model_loader: - kv   6:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv   7:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv   8:                       llama.rope.freq_base f32              = 500000.000000
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  10:                          general.file_type u32              = 2
llama_model_loader: - kv  11:                           llama.vocab_size u32              = 128256
llama_model_loader: - kv  12:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv  13:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  14:                         tokenizer.ggml.pre str              = llama-bpe
⠇ llama_model_loader: - kv  15:                      tokenizer.ggml.tokens arr[str,128256]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  16:                  tokenizer.ggml.token_type arr[i32,128256]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
⠙ llama_model_loader: - kv  17:                      tokenizer.ggml.merges arr[str,280147]  = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv  18:                tokenizer.ggml.bos_token_id u32              = 128000
llama_model_loader: - kv  19:                tokenizer.ggml.eos_token_id u32              = 128009
llama_model_loader: - kv  20:                    tokenizer.chat_template str              = {% set loop_messages = messages %}{% ...
llama_model_loader: - kv  21:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   65 tensors
llama_model_loader: - type q4_0:  225 tensors
llama_model_loader: - type q6_K:    1 tensors
⠹ llm_load_vocab: special tokens definition check successful ( 256/128256 ).
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = BPE
llm_load_print_meta: n_vocab          = 128256
llm_load_print_meta: n_merges         = 280147
llm_load_print_meta: n_ctx_train      = 8192
llm_load_print_meta: n_embd           = 4096
llm_load_print_meta: n_head           = 32
llm_load_print_meta: n_head_kv        = 8
llm_load_print_meta: n_layer          = 32
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 4
llm_load_print_meta: n_embd_k_gqa     = 1024
llm_load_print_meta: n_embd_v_gqa     = 1024
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 14336
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 500000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx  = 8192
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: model type       = 8B
llm_load_print_meta: model ftype      = Q4_0
llm_load_print_meta: model params     = 8.03 B
llm_load_print_meta: model size       = 4.33 GiB (4.64 BPW) 
llm_load_print_meta: general.name     = Meta-Llama-3-8B-Instruct
llm_load_print_meta: BOS token        = 128000 ''
llm_load_print_meta: EOS token        = 128009 ''
llm_load_print_meta: LF token         = 128 'Ä'
llm_load_print_meta: EOT token        = 128009 ''
llm_load_tensors: ggml ctx size =    0.15 MiB
llm_load_tensors:        CPU buffer size =  4437.80 MiB
.......................................................................................
⠸ llama_new_context_with_model: n_ctx      = 2048
llama_new_context_with_model: n_batch    = 512
llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: freq_base  = 500000.0
llama_new_context_with_model: freq_scale = 1
⠦ llama_kv_cache_init:        CPU KV buffer size =   256.00 MiB
llama_new_context_with_model: KV self size  =  256.00 MiB, K (f16):  128.00 MiB, V (f16):  128.00 MiB
llama_new_context_with_model:        CPU  output buffer size =     0.50 MiB
llama_new_context_with_model:        CPU compute buffer size =   258.50 MiB
llama_new_context_with_model: graph nodes  = 1030
llama_new_context_with_model: graph splits = 1
⠧ {"function":"initialize","level":"INFO","line":448,"msg":"initializing slots","n_slots":1,"tid":"0x10139f812000","timestamp":1721017395}
{"function":"initialize","level":"INFO","line":460,"msg":"new slot","n_ctx_slot":2048,"slot_id":0,"tid":"0x10139f812000","timestamp":1721017395}
{"function":"main","level":"INFO","line":3065,"msg":"model loaded","tid":"0x10139f812000","timestamp":1721017395}
{"function":"main","hostname":"127.0.0.1","level":"INFO","line":3268,"msg":"HTTP server listening","n_threads_http":"3","port":"62268","tid":"0x10139f812000","timestamp":1721017395}
{"function":"update_slots","level":"INFO","line":1579,"msg":"all slots are idle and system prompt is empty, clear the KV cache","tid":"0x10139f812000","timestamp":1721017395}
{"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":0,"tid":"0x10139f812000","timestamp":1721017395}
{"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":1,"tid":"0x10139f812000","timestamp":1721017395}
{"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":37211,"status":200,"tid":"0x1013dbe0ae00","timestamp":1721017395}
{"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":2,"tid":"0x10139f812000","timestamp":1721017395}
{"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":60236,"status":200,"tid":"0x1013dbe0a700","timestamp":1721017395}
{"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":3,"tid":"0x10139f812000","timestamp":1721017395}
{"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":43135,"status":200,"tid":"0x1013dbe0a000","timestamp":1721017395}
{"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":4,"tid":"0x10139f812000","timestamp":1721017395}
{"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":31620,"status":200,"tid":"0x1013dbe0ae00","timestamp":1721017395}
{"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":5,"tid":"0x10139f812000","timestamp":1721017395}
{"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":56527,"status":200,"tid":"0x1013dbe0a700","timestamp":1721017395}
{"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":6,"tid":"0x10139f812000","timestamp":1721017395}
{"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":53213,"status":200,"tid":"0x1013dbe0a000","timestamp":1721017395}
{"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":7,"tid":"0x10139f812000","timestamp":1721017395}
{"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":21875,"status":200,"tid":"0x1013dbe0ae00","timestamp":1721017395}
{"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":47567,"status":200,"tid":"0x1013dbe0a700","timestamp":1721017395}
{"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":8,"tid":"0x10139f812000","timestamp":1721017395}
{"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":56264,"status":200,"tid":"0x1013dbe0a700","timestamp":1721017395}
⠇ [GIN] 2024/07/15 - 12:23:15 | 200 |          1m8s |       10.0.0.12 | POST     "/api/chat"
>>> hello
time=2024-07-15T14:22:47.710+08:00 level=INFO source=cpu_common.go:18 msg="CPU does not have vector extensions"
⠋ time=2024-07-15T14:23:02.785+08:00 level=INFO source=cpu_common.go:18 msg="CPU does not have vector extensions"
⠙ time=2024-07-15T14:23:02.789+08:00 level=INFO source=server.go:289 msg="starting llama server" cmd="/tmp/ollama1084183988/runners/cpu/ollama_llama_server --model /home/skywalk/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa --ctx-size 2048 --batch-size 512 --embedding --log-disable --parallel 1 --port 61604"
time=2024-07-15T14:23:02.811+08:00 level=INFO source=sched.go:340 msg="loaded runners" count=1
time=2024-07-15T14:23:02.812+08:00 level=INFO source=server.go:432 msg="waiting for llama runner to start responding"
{"function":"server_params_parse","level":"INFO","line":2604,"msg":"logging to file is disabled.","tid":"0x20da49412000","timestamp":1721024582}
{"build":2770,"commit":"952d03db","function":"main","level":"INFO","line":2821,"msg":"build info","tid":"0x20da49412000","timestamp":1721024582}
{"function":"main","level":"INFO","line":2828,"msg":"system info","n_threads":4,"n_threads_batch":-1,"system_info":"AVX = 0 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | ","tid":"0x20da49412000","timestamp":1721024582,"total_threads":4}
⠸ llama_model_loader: loaded meta data with 22 key-value pairs and 291 tensors from /home/skywalk/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = Meta-Llama-3-8B-Instruct
llama_model_loader: - kv   2:                          llama.block_count u32              = 32
llama_model_loader: - kv   3:                       llama.context_length u32              = 8192
llama_model_loader: - kv   4:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 14336
llama_model_loader: - kv   6:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv   7:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv   8:                       llama.rope.freq_base f32              = 500000.000000
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  10:                          general.file_type u32              = 2
llama_model_loader: - kv  11:                           llama.vocab_size u32              = 128256
llama_model_loader: - kv  12:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv  13:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  14:                         tokenizer.ggml.pre str              = llama-bpe
⠼ llama_model_loader: - kv  15:                      tokenizer.ggml.tokens arr[str,128256]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  16:                  tokenizer.ggml.token_type arr[i32,128256]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
⠧ llama_model_loader: - kv  17:                      tokenizer.ggml.merges arr[str,280147]  = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv  18:                tokenizer.ggml.bos_token_id u32              = 128000
llama_model_loader: - kv  19:                tokenizer.ggml.eos_token_id u32              = 128009
llama_model_loader: - kv  20:                    tokenizer.chat_template str              = {% set loop_messages = messages %}{% ...
llama_model_loader: - kv  21:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   65 tensors
llama_model_loader: - type q4_0:  225 tensors
llama_model_loader: - type q6_K:    1 tensors
⠇ llm_load_vocab: special tokens definition check successful ( 256/128256 ).
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = BPE
llm_load_print_meta: n_vocab          = 128256
llm_load_print_meta: n_merges         = 280147
llm_load_print_meta: n_ctx_train      = 8192
llm_load_print_meta: n_embd           = 4096
llm_load_print_meta: n_head           = 32
llm_load_print_meta: n_head_kv        = 8
llm_load_print_meta: n_layer          = 32
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 4
llm_load_print_meta: n_embd_k_gqa     = 1024
llm_load_print_meta: n_embd_v_gqa     = 1024
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 14336
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 500000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx  = 8192
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: model type       = 8B
llm_load_print_meta: model ftype      = Q4_0
llm_load_print_meta: model params     = 8.03 B
llm_load_print_meta: model size       = 4.33 GiB (4.64 BPW) 
llm_load_print_meta: general.name     = Meta-Llama-3-8B-Instruct
llm_load_print_meta: BOS token        = 128000 ''
llm_load_print_meta: EOS token        = 128009 ''
llm_load_print_meta: LF token         = 128 'Ä'
llm_load_print_meta: EOT token        = 128009 ''
llm_load_tensors: ggml ctx size =    0.15 MiB
⠙ llm_load_tensors:        CPU buffer size =  4437.80 MiB
.......................................................................................
llama_new_context_with_model: n_ctx      = 2048
llama_new_context_with_model: n_batch    = 512
llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: freq_base  = 500000.0
llama_new_context_with_model: freq_scale = 1
⠴ llama_kv_cache_init:        CPU KV buffer size =   256.00 MiB
llama_new_context_with_model: KV self size  =  256.00 MiB, K (f16):  128.00 MiB, V (f16):  128.00 MiB
llama_new_context_with_model:        CPU  output buffer size =     0.50 MiB
llama_new_context_with_model:        CPU compute buffer size =   258.50 MiB
llama_new_context_with_model: graph nodes  = 1030
llama_new_context_with_model: graph splits = 1
⠦ {"function":"initialize","level":"INFO","line":448,"msg":"initializing slots","n_slots":1,"tid":"0x20da49412000","timestamp":1721024651}
{"function":"initialize","level":"INFO","line":460,"msg":"new slot","n_ctx_slot":2048,"slot_id":0,"tid":"0x20da49412000","timestamp":1721024651}
{"function":"main","level":"INFO","line":3065,"msg":"model loaded","tid":"0x20da49412000","timestamp":1721024651}
{"function":"main","hostname":"127.0.0.1","level":"INFO","line":3268,"msg":"HTTP server listening","n_threads_http":"3","port":"61604","tid":"0x20da49412000","timestamp":1721024651}
{"function":"update_slots","level":"INFO","line":1579,"msg":"all slots are idle and system prompt is empty, clear the KV cache","tid":"0x20da49412000","timestamp":1721024651}
{"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":0,"tid":"0x20da49412000","timestamp":1721024651}
{"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":1,"tid":"0x20da49412000","timestamp":1721024651}
{"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":48229,"status":200,"tid":"0x20da85a0a000","timestamp":1721024651}
{"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":2,"tid":"0x20da49412000","timestamp":1721024651}
{"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":33319,"status":200,"tid":"0x20da85a0a700","timestamp":1721024651}
{"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":3,"tid":"0x20da49412000","timestamp":1721024651}
{"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":54187,"status":200,"tid":"0x20da85a0ae00","timestamp":1721024651}
{"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":4,"tid":"0x20da49412000","timestamp":1721024651}
{"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":28162,"status":200,"tid":"0x20da85a0a000","timestamp":1721024651}
{"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":33773,"status":200,"tid":"0x20da85a0a700","timestamp":1721024651}
{"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":5,"tid":"0x20da49412000","timestamp":1721024651}
{"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":6,"tid":"0x20da49412000","timestamp":1721024651}
{"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":19633,"status":200,"tid":"0x20da85a0ae00","timestamp":1721024651}
{"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":7,"tid":"0x20da49412000","timestamp":1721024651}
{"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":35779,"status":200,"tid":"0x20da85a0a000","timestamp":1721024651}
{"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":18413,"status":200,"tid":"0x20da85a0a700","timestamp":1721024651}
⠧ {"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":8,"tid":"0x20da49412000","timestamp":1721024651}
{"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":36742,"status":200,"tid":"0x20da85a0a700","timestamp":1721024651}
{"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":9,"tid":"0x20da49412000","timestamp":1721024651}
{"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":36742,"status":200,"tid":"0x20da85a0a700","timestamp":1721024651}
⠇ {"function":"log_server_request","level":"INFO","line":2742,"method":"POST","msg":"request","params":{},"path":"/tokenize","remote_addr":"10.0.0.12","remote_port":36742,"status":200,"tid":"0x20da85a0a700","timestamp":1721024651}
{"function":"process_single_task","level":"INFO","line":1511,"msg":"slot data","n_idle_slots":1,"n_processing_slots":0,"task_id":10,"tid":"0x20da49412000","timestamp":1721024651}
{"function":"log_server_request","level":"INFO","line":2742,"method":"GET","msg":"request","params":{},"path":"/health","remote_addr":"10.0.0.12","remote_port":36742,"status":200,"tid":"0x20da85a0a700","timestamp":1721024651}
⠏ {"function":"launch_slot_with_data","level":"INFO","line":833,"msg":"slot is processing task","slot_id":0,"task_id":11,"tid":"0x20da49412000","timestamp":1721024651}
{"function":"update_slots","ga_i":0,"level":"INFO","line":1817,"msg":"slot progression","n_past":0,"n_past_se":0,"n_prompt_tokens_processed":10,"slot_id":0,"task_id":11,"tid":"0x20da49412000","timestamp":1721024651}
{"function":"update_slots","level":"INFO","line":1841,"msg":"kv cache rm [p0, end)","p0":0,"slot_id":0,"task_id":11,"tid":"0x20da49412000","timestamp":1721024651}
Hello! It's nice to meet you. Is there something I can help you with, or 
would you like to chat?{"function":"print_timings","level":"INFO","line":276,"msg":"prompt eval time     =  106459.91 ms /    10 tokens (10645.99 ms per token,     0.09 tokens per second)","n_prompt_tokens_processed":10,"n_tokens_second":0.09393207617523164,"slot_id":0,"t_prompt_processing":106459.906,"t_token":10645.990600000001,"task_id":11,"tid":"0x20da49412000","timestamp":1721027627}
{"function":"print_timings","level":"INFO","line":290,"msg":"generation eval time = 2868918.63 ms /    26 runs   (110343.02 ms per token,     0.01 tokens per second)","n_decoded":26,"n_tokens_second":0.00906264811318913,"slot_id":0,"t_token":110343.0241923077,"t_token_generation":2868918.629,"task_id":11,"tid":"0x20da49412000","timestamp":1721027627}
{"function":"print_timings","level":"INFO","line":299,"msg":"          total time = 2975378.54 ms","slot_id":0,"t_prompt_processing":106459.906,"t_token_generation":2868918.629,"t_total":2975378.535,"task_id":11,"tid":"0x20da49412000","timestamp":1721027627}
{"function":"update_slots","level":"INFO","line":1649,"msg":"slot released","n_cache_tokens":36,"n_ctx":2048,"n_past":35,"n_system_tokens":0,"slot_id":0,"task_id":11,"tid":"0x20da49412000","timestamp":1721027627,"truncated":false}
{"function":"log_server_request","level":"INFO","line":2742,"method":"POST","msg":"request","params":{},"path":"/completion","remote_addr":"10.0.0.12","remote_port":36742,"status":200,"tid":"0x20da85a0a700","timestamp":1721027627}
[GIN] 2024/07/15 - 15:13:47 | 200 |        50m59s |       10.0.0.12 | POST     "/api/chat"


タグ: FreeBSD Ollama LLM コンパイル カスタムバージョン

5月13日 06:50 投稿