之前的篇章都是用SemanticKernel來連線OpenAI的API,當然是需要費用,另外還有使用限制,本篇來說明在SK中使用開源模型LLama3。
首先引入Nuget包,這裏使用的是LLamaSharp這個三方包,因為沒有顯卡,只能跑在CPU上,所以也需要引入對應的Cpu包,最後引入SK的LLama版的包。
<ItemGroup>
<PackageReferenceInclude="LLamaSharp"Version="0.11.2" />
<PackageReferenceInclude="LLamaSharp.Backend.Cpu"Version="0.11.2" />
<PackageReferenceInclude="LLamaSharp.semantic-kernel"Version="0.11.2" />
</ItemGroup>
接下就是下載最新的LLama3了,副檔名是gguf,如下程式碼就可以輕松地跑起本地小模型了。
using LLama.Common;
using LLama;
using LLamaSharp.SemanticKernel.ChatCompletion;
using System.Text;
using ChatHistory = LLama.Common.ChatHistory;
using AuthorRole = LLama.Common.AuthorRole;
await SKRunAsync();
async Task SKRunAsync()
{
var modelPath = @"C:\llama\llama-2-coder-7b.Q8_0.gguf";
var parameters = new ModelParams(modelPath)
{
ContextSize = 1024,
Seed = 1337,
GpuLayerCount = 5,
Encoding = Encoding.UTF8,
};
usingvar model = LLamaWeights.LoadFromFile(parameters);
var ex = new StatelessExecutor(model, parameters);
var chatGPT = new LLamaSharpChatCompletion(ex);
var chatHistory = chatGPT.CreateNewChat(@"這是assistant和user之間的對話。
assistant是一名.net和C#專家,能準確回答user提出的專業問題。");
Console.WriteLine("開始聊天:");
Console.WriteLine("------------------------");
while (true)
{
Console.Write("user:");
var userMessage = Console.ReadLine();
chatHistory.AddUserMessage(userMessage);
var first = true;
var content = "";
awaitforeach (var reply in chatGPT.GetStreamingChatMessageContentsAsync(chatHistory))
{
if (first)
{
first = false;
Console.Write(reply.Role + ":");
}
content += reply.Content;
Console.Write(reply.Content);
}
chatHistory.AddAssistantMessage(content);
}
}
下面是具體的效果,除了慢點,沒有GPT強大點,其他都是很香的,關鍵是沒有key,輕松跑,不怕信用卡超支。