Record instruction fine-tuning Qwen-7B-Chat

The writing is a bit casual, and the teacher provided four topics:

YOLO Object Detection - 6 classes
Unet Object Segmentation - 3 classes
Fine-tuning large models
Poetry continuation based on LSTM.

The first two topics don't teach anything, and the fourth one is too troublesome with tokenization and dataset cleaning.

So I went ahead and chose the third one. I haven't done it before and thought I could learn something while having some fun. It requires a doctor that can converse. However, I actually want to create a cyber girlfriend =-=.

The girlfriend will have to wait until next time.

I'm quite happy that I can run lora fine-tuning locally. The dataset needs some ideas and references.

Instruction Fine-tuning#

Usually refers to using the following format as the model training set:

{
  "instruction": "Now you are to play the role of the woman beside the emperor -- Zhen Huan",
  "input": "Who are you?",
  "output": "My father is Zhen Yuandao, the minor official of the Dali Temple."
}

There is a very interesting model that fine-tunes a large model using Zhen Huan's dialogues. It seems to be called Chat-Huan Huan.

However, what I am going to do this time is relatively less interesting; it's a doctor used to answer questions about symptoms and medical issues. If possible, I still prefer to fine-tune a cyber girlfriend.

But for now, let's familiarize ourselves with the process, and later we can fine-tune as we wish.

The large model has wrapped up the tokenizer and everything, and we just need to put the dataset in.

Those embeddings, masks, and attention mechanisms don't need to be considered, which allows even a lazy person like me to play.

The QA dataset I created:

Link: https://pan.baidu.com/s/1VX-ffLlXjShXJOUlkAZIwA?pwd=tv2y Extraction code: tv2y

Top 3 previews:

{
    "instruction": "Headache and dizziness for more than ten years. Wind and sun exposure. Noise causes headaches, what should be done if no results are found?",
    "input": "",
    "output": ": Considered to be neural headache, a functional disease caused by nervous dysfunction due to stress or mental factors, with some genetic factors. Generally, consider taking some traditional Chinese medicine like Chuanxiong Cha Tiao San for treatment. Combining acupuncture and massage will yield better results."
},
{
    "instruction": "Eye fatigue, can't keep them open, dry",
    "input": "",
    "output": "Generally caused by conjunctivitis or dry inflammation, you can check at the hospital."
},
{
    "instruction": "What is under a woman's left breast? What does it mean if it hurts there?",
    "input": "",
    "output": "The heart is in that position, and angina can occur there. It could also be caused by stomach pain, which is located slightly lower than the heart."
}

Some questions may correspond to multiple answers.

The dataset source is: zhangsheng93/cMedQA2

I just did some simple processing to convert the dataset into JSON format.

Additionally, I cleaned up the answer data that exceeded 100 or 150 in length because I don't have enough GPU memory; even if inputted, it needs to be truncated.

There are two training methods here: one is what I tried on a small dataset, training the top 100 for 60 rounds to try to make the model remember the answers, and the other is that I increased the dataset and only trained for three or four rounds.

I haven't decided yet.

alt text

This is the loss situation of my training. If I only train for two or three rounds, the model has only superficially looked at the data. I feel it won't change its original way of answering much, although some friends might say that the loss of LLM is not useful.

But I hope the model can change its original way of answering, which lists all possibilities one by one, and I prefer it to provide just one possibility like the dataset I provided, along with a solution. I hope it can learn that way of speaking; only then will I consider my fine-tuning successful.

Because if I want to train a cyber girlfriend in the future, I don't want her to talk to me like this:

What do you want to eat tonight? I think these are good:

- Hotpot, eating hotpot in winter is the best choice
- Sushi, sushi is a type of Japanese cuisine (it seems to have some strange things mixed in, this was written by copilot)
- Roast duck, roast duck is a specialty dish from Beijing
- Grilled fish, grilled fish is a specialty dish from Hunan

I hope she tells me directly:

Let's eat hotpot tonight; it's the best in winter.

Exciting!

Prompts can change some of the model's answering styles, but they seem unable to fundamentally change the first way of answering, and I hope to achieve the second. So I choose fine-tuning, and I choose to train more.

After half an hour of fine-tuning the Top 100 for 60 rounds, the results are quite satisfying, as shown in the training set:

alt text

Although I don't quite agree with stewing dog meat, it leaves me a bit puzzled.

alt text

It seems to be related to dog meat. But there is diversity; I remember there was also mention of eating Liuwei Dihuang Wan =-=.

This is the QA:

"instruction": "Is sweating every morning before getting up curable? I have a stomach disease but have not used medication before. Is there any folk remedy for sweating every morning before getting up?",
"input": "",
"output": "Easily sweating may indicate qi deficiency; you can use astragalus to brew tea and drink more chicken soup. You can also take some Bu Zhong Yi Qi Wan, Shen Ling Bai Zhu San, and Bu Zhong Yi Qi Tang Wan."

Let me try another one outside the training set.

alt text

I must say, I highly agree with this.

copilot, do you think it's better to run at night or in the morning?

- Morning running helps improve metabolism and aids in weight loss.
- Night running helps relax the body and mind and aids in sleep.
- Morning running helps improve metabolism and aids in weight loss.
- Night running helps relax the body and mind and aids in sleep.

This was written by copilot.

Don't you know that running after nine o'clock at night makes it harder to sleep?

Let's stop here and continue.

However, I did think of a training method, which is to first train multiple rounds on a small dataset and then train fewer rounds on a large dataset, so the model can learn both the way of speaking and gain more knowledge from the large dataset.

But this involves checkpoint continuation, and I'm not sure if llama-factory requires additional operations.

Fine-tuning Tools:#

~~datawhalechina/self-llm~~

This repository has documented the fine-tuning process of many large models using Jupyter Notebook + Markdown, and users can basically run through it.

It can be said to be lazy enough, but there are still potential problems, as over time, with updates to Python library versions, some breaking changes were introduced in transformers>4.35.0, causing many models to not run as before.

Even if I try to synchronize the same environment as the author, I still encounter issues, such as win error file not found.

At this point, Docker is needed.

After a day of torment, I decisively switched to Docker.

4060Ti16G Graphics Card Graphical Fine-tuning Training Tongyi Qianwen Qwen Model (suitable for novice friends)

Here is a guy who packaged llama-factory into a Docker image, and I just need to install the image and then I can run it directly.

The only downside is that he didn't specify the version when adding packages later:

pip install einops transformers_stream_generator optimum auto-gptq -i https://pypi.tuna.tsinghua.edu.cn/simple/ --trusted-host pypi.tuna.tsinghua.edu.cn

By default, this will download the latest version, and if there is too long a gap between the video and my attempt, new issues may arise, so I will probably need to specify the version later if I get it running.

However, the benefit of Docker is that you don't even have to worry about the results caused by slight differences in the system environment. The starting point is already high.

Indeed, this thing has stalled me for an hour, but fortunately, my graphics card can be called directly. This solves a lot of troubles, and I will record the version after I finish testing.

The final confirmed versions are:

>>> import einops
>>> einops.__version__
'0.8.0'
>>> import transformers
>>> transformers.__version__
'4.34.1'

(llama-factroy) root@docker-desktop:/LLaMA-Factory# pip show transformers_stream_generator
Name: transformers-stream-generator
Version: 0.0.5

>>> import datasets
>>> datasets.__version__
'2.14.6'

(llama-factroy) root@docker-desktop:/LLaMA-Factory# pip show optimum
Name: optimum
Version: 1.23.3

(llama-factroy) root@docker-desktop:/LLaMA-Factory# pip show auto-gptq
Name: auto-gptq
Version: 0.6.0

If written as requirements.txt:

einops==0.8.0
transformers==4.34.1
transformers-stream-generator==0.0.5
datasets==2.14.6
optimum==1.23.3
auto-gptq==0.6.0

In addition to these package issues, you also need to solve the --gpu all problem. If you are on Windows, you can refer to Windows allows Docker to support NVIDIA Container Toolkit.

Doing these things is certainly most convenient on Linux, but unfortunately, my computer is at home, and I can only connect via Windows.

Steps:#

1. Download the dataset and place it in the `data` folder.#

You need a dataset_info.json. Refer to the link of my dataset.

Note that the sha1 key inside cannot be deleted; I remember this is used for verification, but even if I use the one left over from before, it works.

However, if you delete it, in llama-factory, it will keep reporting that dataset_info.json cannot be found.

2. Download the Docker image:#

docker pull bucess/llama-factory:1

I am not using the official image.

Its download may take a while, and sometimes some layers seem to get stuck, but don't worry; find a relatively stable network environment, connect your adapter, set up your proxy, and then wait.

With the image, starting it each time is quite fast. Even after the container is set up, you can just use start -i next time.

3. Start the container:#

docker run -it --name llama-factory --gpus all --network host --shm-size 4g -v D:\senmen\data:/LLaMA-Factory/data bucess/llama-factory:1 /bin/bash

You need to change D:\senmen\data to the location of your dataset. Of course, if you are confident, you can also put the model in there. Just be careful not to accidentally delete it.

The best attempt is to use docker cp for copying the model.

4. Add Python packages and start llama-factory#

pip install einops==0.8.0 transformers==4.34.1 transformers-stream-generator==0.0.5 datasets==2.14.6 optimum==1.23.3 auto-gptq==0.6.0 -i https://pypi.tuna.tsinghua.edu.cn/simple/ --trusted-host pypi.tuna.tsinghua.edu.cn

-i https://pypi.tuna.tsinghua.edu.cn/simple/ --trusted-host pypi.tuna.tsinghua.edu.cn

This parameter is quite useful because the container seems to have no proxy by default, so using this avoids the need to set a temporary http_proxy.

Modify src/train_web.py (optional):

After I started the gradio application, I couldn't access the panel locally via ip+port, so I used the share parameter of gradio.

(llama-factroy) root@docker-desktop:/LLaMA-Factory# cat src/train_web.py
from llmtuner import create_ui


def main():
    demo = create_ui()
    demo.queue()
    demo.launch(share=True, inbrowser=True)


if __name__ == "__main__":
    main()

It is worth noting that you need to download this:

https://cdn-media.huggingface.co/frpc-gradio-0.2/frpc_linux_amd64

It is always reported as a virus by Windows and automatically deleted.

You need to place it in the appropriate location and rename it; if you run share=True without this file, it will throw an error automatically. I won't elaborate here. You can also refer to:

https://github.com/gradio-app/gradio/issues/8186

Here are the details of the error.

5. Start llama-factory#

python src/train_web.py

If you executed step 4, this will generate a temporary URL that you can call from any computer.

Something like this: https://d6cdc0f5cda64dd72b.gradio.live/

Since my computer is at home, and I need to train, call, and demonstrate at school, this is very convenient.

6. Load our top100 dataset and train#

After this, everything is web UI operations, and as a blogger, I won't write about it.

You need to change the "name" after dataset_info.json to "qa_top100.json".

For specific operations, you can refer to: 4060Ti16G Graphics Card Graphical Fine-tuning Training Tongyi Qianwen Qwen Model (suitable for novice friends)

Interestingly, it uses a method of training with int4 quantized models and finally exporting them as float32, which not only lowers the training threshold but also makes inference speed very fast after exporting the non-quantized model.

This directly solves my biggest problem, as I previously tried to train a non-quantized model, and even with lora and a batch_size set to 1, it still exploded the GPU memory, and it was unstable, causing remote desktop black screens and flickering.

After going through this process, I set my batch_size to 8, with GPU memory usage around 13G, fully utilized, but it doesn't affect my remote desktop, which is great.

7. Checkpoint continuation (not done yet)#

This is what I want to do last because I want to train multiple rounds on a small dataset and then fewer rounds on a large dataset, so the model can learn both the way of speaking and gain more knowledge from the large dataset.