Cloning into 'llama.cpp'...
remote: Enumerating objects: 67417, done.
remote: Counting objects: 100% (77/77), done.
remote: Compressing objects: 100% (56/56), done.
remote: Total 67417 (delta 49), reused 21 (delta 21), pack-reused 67340 (from 4)
Receiving objects: 100% (67417/67417), 195.86 MiB | 32.98 MiB/s, done.
Resolving deltas: 100% (48678/48678), done.
Makefile:6: *** Build system changed:
The Makefile build has been replaced by CMake.
For build instructions see: https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md. Stop.
Unsloth: ##### The current model auto adds a BOS token.
Unsloth: ##### Your chat template has a BOS token. We shall remove it temporarily.
Unsloth: Merging model weights to 16-bit format...
config.json: 100%
891/891 [00:00<00:00, 50.3kB/s]
Found HuggingFace hub cache directory: /root/.cache/huggingface/hub
model.safetensors.index.json:
24.0k/? [00:00<00:00, 1.76MB/s]
Checking cache directory for required files...
Cache check failed: model-00001-of-00004.safetensors not found in local cache.
Not all required files found in cache. Will proceed with downloading.
Checking cache directory for required files...
Cache check failed: tokenizer.model not found in local cache.
Not all required files found in cache. Will proceed with downloading.
Unsloth: Preparing safetensor model files: 0%| | 0/4 [00:00<?, ?it/s]
model-00001-of-00004.safetensors: 100%
4.98G/4.98G [06:16<00:00, 23.8MB/s]
Unsloth: Preparing safetensor model files: 25%|██▌ | 1/4 [06:17<18:51, 377.18s/it]
model-00002-of-00004.safetensors: 100%
5.00G/5.00G [05:54<00:00, 18.8MB/s]
Unsloth: Preparing safetensor model files: 50%|█████ | 2/4 [12:11<12:07, 363.75s/it]
model-00003-of-00004.safetensors: 100%
4.92G/4.92G [05:10<00:00, 16.0MB/s]
Unsloth: Preparing safetensor model files: 75%|███████▌ | 3/4 [17:22<05:39, 339.79s/it]
model-00004-of-00004.safetensors: 100%
1.17G/1.17G [00:54<00:00, 25.0MB/s]
Unsloth: Preparing safetensor model files: 100%|██████████| 4/4 [18:17<00:00, 274.43s/it]
Note: tokenizer.model not found (this is OK for non-SentencePiece models)
Unsloth: Merging weights into 16bit: 100%|██████████| 4/4 [09:46<00:00, 146.55s/it]
Unsloth: Merge process complete. Saved to `/content/gguf_output`
Unsloth: Converting to GGUF format...
==((====))== Unsloth: Conversion from HF to GGUF information
\\ /| [0] Installing llama.cpp might take 3 minutes.
O^O/ \_/ \ [1] Converting HF to GGUF f16 might take 3 minutes.
\ / [2] Converting GGUF f16 to ['q4_k_m'] might take 10 minutes each.
"-____-" In total, you will have to wait at least 16 minutes.
Unsloth: Installing llama.cpp. This might take 3 minutes...
Unsloth: llama.cpp folder exists but binaries not found - will rebuild
Unsloth: Updating system package directories
Unsloth: All required system packages already installed!
Unsloth: Install llama.cpp and building - please wait 1 to 3 minutes
Unsloth: Install GGUF and other packages
为什么合并不了guff?.safetensors有四个这个文件,我搜过 好像要克隆
llama-cpp 然后说那个colab版本太低导致编译不了,老师非常急,