-
Notifications
You must be signed in to change notification settings - Fork 762
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update ChatRole.cs #5753
base: main
Are you sure you want to change the base?
Update ChatRole.cs #5753
Conversation
Adds `developer` role for `o1` and newer models.
|
||
/// <summary>Gets the role that sets developer-provided instructions the model should follow, regardless of messages sent by the user.</summary> | ||
/// <remarks>With o1 models and newer, developer messages replace the previous system messages.</remarks> | ||
public static ChatRole Developer { get; } = new("developer"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced we want to add this. The other roles are all fairly common across multiple services. Right now this one is specific to OpenAI. We don't have "model" as a first-class API, for example, even though it's used by gemini. Until the library is updated, this won't even work with the openai provider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Developer messages would effectively replace system messages for all new OpenAI models:
https://cdn.openai.com/spec/model-spec-2024-05-08.html#overview
Using system messages will throw errors.
How would you like to proceed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the linked document:
According to this, "Developer" simply replaces "System". So I would expect the OpenAI adapter to transmit "System" messages as "Developer" (if it knows which OpenAI models expect System and which expect Developer).
This seems like an unfortunate design that could confuse the ecosystem. Maybe there's a good reason for it but it's not clear from the naming alone.
If it does end up creating widespread confusion we could add ChatRole.Developer
with the expectation that other clients will transmit it as "System", and then consumers can use whichever role name they prefer. But again, it's a shame that the ecosystem would be put in this position.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Today, if you try to send system messages with o1, it will throw an error saying that system messages are not supported.
@SteveSandersonMS, you are suggesting adding custom code in CompleteAsync / CompleteStreamingAsync to transform every system
message in a developer
message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Licantrop0 Yes, though it would have to be only for the models that require this. But it still might not work end-to-end - it depends whether the underlying OpenAI client supports it yet.
It would be interesting to try this and see if it does work, and how confident we can be about mapping System->Developer based on the model. I don't know if this would be a valid solution for Azure OpenAI since in that case we don't necessarily know what model is being used, but only the deployment name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, we would have to scan through the full list of chat messages replacing system
with developer
every time a completion is requested, and it's not very efficient. And yes, as you pointed out some heuristics to understand the model from the deployment name needs to be in place.
I understand that the developer
role is specifically to OpenAI, and adding it in the AI.Abstraction
project is not clean architecture-wise, but leaving the choice to the end developer, it could be the best/safest option we have.
I still haven't really tested if it works (access to o1
requires spending 1k in OpenAI credits, and the non-preview version is still not available in Azure), but for now I'm using this code:
var DevRole = new ChatRole("developer");
List<ChatMessage> chatMessages = [new ChatMessage(DevRole, DeveloperPromptO1)];
chatMessages.Add(new ChatMessage(ChatRole.User, userInput));
var asyncUpdates = OpenAIHelpers.OpenAIClient.CompleteStreamingAsync(chatMessages);
await foreach (var update in asyncUpdates)
{
response.Append(update);
Console.Write(update);
}
// Streaming doesn't add Assistant replies to the history, so we add the last one here
chatMessages.Add(new ChatMessage(ChatRole.Assistant, response.ToString()));
@stephentoub / @SteveSandersonMS: do you agree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, we would have to scan through the full list of chat messages replacing system with developer every time a completion is requested, and it's not very efficient
There's no meaningful additional overhead here. Every request already entails translating from the M.E.AI object to the target, either to some other object model or to the request body being written out. It already needs to look at every message's role, it's just a question of what it translates it to.
but leaving the choice to the end developer, it could be the best/safest option we have
M.E.AI.Abstractions provides an abstraction. It needs to be mapped to the underlying target and is rarely always 1:1. It's part of the deal that things may not be represented in the abstraction exactly how they are under the covers.
I still haven't really tested if it works
It will not work as intended today, as the OpenAI 2.1.0 package has no knowledge of this new role type.
I suggest efforts at this point would be better spent learning why it is OpenAI felt a need to create this new role type. At the moment it seems like it's simply a rename, which makes the current decisions around not having any compatibility between "system" and "developer" very confusing.
Regardless, while I do appreciate the effort, the PR as it currently stands is not a complete solution and not something to be merged. It's adding a public ChatRole that won't be appropriately recognized by the only target that could even support it today.
🎉 Good job! The coverage increased 🎉
Full code coverage report: https://dev.azure.com/dnceng-public/public/_build/results?buildId=900944&view=codecoverage-tab |
Adds
developer
role foro1
and newer models.See: Developer messages
Microsoft Reviewers: Open in CodeFlow