Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update ChatRole.cs #5753

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Update ChatRole.cs #5753

wants to merge 3 commits into from

Conversation

Licantrop0
Copy link

@Licantrop0 Licantrop0 commented Dec 19, 2024

Adds developer role for o1 and newer models.
See: Developer messages

Microsoft Reviewers: Open in CodeFlow

Adds `developer` role for `o1` and newer models.

/// <summary>Gets the role that sets developer-provided instructions the model should follow, regardless of messages sent by the user.</summary>
/// <remarks>With o1 models and newer, developer messages replace the previous system messages.</remarks>
public static ChatRole Developer { get; } = new("developer");
Copy link
Member

@stephentoub stephentoub Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced we want to add this. The other roles are all fairly common across multiple services. Right now this one is specific to OpenAI. We don't have "model" as a first-class API, for example, even though it's used by gemini. Until the library is updated, this won't even work with the openai provider.

Copy link
Author

@Licantrop0 Licantrop0 Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Developer messages would effectively replace system messages for all new OpenAI models:
https://cdn.openai.com/spec/model-spec-2024-05-08.html#overview

Using system messages will throw errors.
How would you like to proceed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the linked document:

image

According to this, "Developer" simply replaces "System". So I would expect the OpenAI adapter to transmit "System" messages as "Developer" (if it knows which OpenAI models expect System and which expect Developer).

This seems like an unfortunate design that could confuse the ecosystem. Maybe there's a good reason for it but it's not clear from the naming alone.

If it does end up creating widespread confusion we could add ChatRole.Developer with the expectation that other clients will transmit it as "System", and then consumers can use whichever role name they prefer. But again, it's a shame that the ecosystem would be put in this position.

Copy link
Author

@Licantrop0 Licantrop0 Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Today, if you try to send system messages with o1, it will throw an error saying that system messages are not supported.
@SteveSandersonMS, you are suggesting adding custom code in CompleteAsync / CompleteStreamingAsync to transform every system message in a developer message?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Licantrop0 Yes, though it would have to be only for the models that require this. But it still might not work end-to-end - it depends whether the underlying OpenAI client supports it yet.

It would be interesting to try this and see if it does work, and how confident we can be about mapping System->Developer based on the model. I don't know if this would be a valid solution for Azure OpenAI since in that case we don't necessarily know what model is being used, but only the deployment name.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we would have to scan through the full list of chat messages replacing system with developer every time a completion is requested, and it's not very efficient. And yes, as you pointed out some heuristics to understand the model from the deployment name needs to be in place.

I understand that the developer role is specifically to OpenAI, and adding it in the AI.Abstraction project is not clean architecture-wise, but leaving the choice to the end developer, it could be the best/safest option we have.

I still haven't really tested if it works (access to o1 requires spending 1k in OpenAI credits, and the non-preview version is still not available in Azure), but for now I'm using this code:

var  DevRole = new ChatRole("developer");
List<ChatMessage> chatMessages = [new ChatMessage(DevRole, DeveloperPromptO1)];
chatMessages.Add(new ChatMessage(ChatRole.User, userInput));

var asyncUpdates = OpenAIHelpers.OpenAIClient.CompleteStreamingAsync(chatMessages);
await foreach (var update in asyncUpdates)
{
   response.Append(update);
   Console.Write(update);
}

// Streaming doesn't add Assistant replies to the history, so we add the last one here
chatMessages.Add(new ChatMessage(ChatRole.Assistant, response.ToString()));

@stephentoub / @SteveSandersonMS: do you agree?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we would have to scan through the full list of chat messages replacing system with developer every time a completion is requested, and it's not very efficient

There's no meaningful additional overhead here. Every request already entails translating from the M.E.AI object to the target, either to some other object model or to the request body being written out. It already needs to look at every message's role, it's just a question of what it translates it to.

but leaving the choice to the end developer, it could be the best/safest option we have

M.E.AI.Abstractions provides an abstraction. It needs to be mapped to the underlying target and is rarely always 1:1. It's part of the deal that things may not be represented in the abstraction exactly how they are under the covers.

I still haven't really tested if it works

It will not work as intended today, as the OpenAI 2.1.0 package has no knowledge of this new role type.

I suggest efforts at this point would be better spent learning why it is OpenAI felt a need to create this new role type. At the moment it seems like it's simply a rename, which makes the current decisions around not having any compatibility between "system" and "developer" very confusing.

Regardless, while I do appreciate the effort, the PR as it currently stands is not a complete solution and not something to be merged. It's adding a public ChatRole that won't be appropriately recognized by the only target that could even support it today.

@dotnet-comment-bot
Copy link
Collaborator

🎉 Good job! The coverage increased 🎉
Update MinCodeCoverage in the project files.

Project Expected Actual
Microsoft.Extensions.AI 88 89

Full code coverage report: https://dev.azure.com/dnceng-public/public/_build/results?buildId=900944&view=codecoverage-tab

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants