Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Keyin API for itwin.js for AI usecase #7614

Open
khanaffan opened this issue Jan 27, 2025 · 6 comments
Open

Add support for Keyin API for itwin.js for AI usecase #7614

khanaffan opened this issue Jan 27, 2025 · 6 comments

Comments

@khanaffan
Copy link
Contributor

A KeyIn allows features to be invoked through textual commands in MicroStation. Although this method is old, it can be a very effective way to bridge the complexity of the UI and the application with LLMs.

  • Offer a registry for KeyIns, including commands, their arguments, descriptions of the arguments and commands, along with example invocations.
  • Ensure all UI components in itwin.js register any commands they expose via the UI as KeyIns, so they can be advertised to LLMs.
  • Enable non-UI functionalities to be exposed via KeyIns as well, such as importing schemas or creating cubes.

Some advantages

  • This straightforward infrastructure enables itwin.js apps to leverage LLMs efficiently without needing to write any code to invoke the feature.
  • It also facilitates the creation of compound Keyins for more complex tasks, allowing LLMs to handle them.
  • It supports cross-application functionality, effectively decoupling LLM/AI code from the application. This allows the AI team to concentrate on AI workflows rather than figuring out how to invoke individual application features.

The registry should allow keyin in same way as LLM expose tool/functions where tool and its parameters has description so LLM can fill out parameters from user input.

keyIns.register({
  type: "function",
  name: "toggle_subject_by_id",
  description: "Toggle subject by id",
  parameters: [
    {
      name: "id",
      type: "string",
      description: "Id is a hex string of subject",
    },
  ],
  handler: async (id: Id64String) => {},
});
@rajkanben
Copy link

This key-ins approach would help in

  • Co-pilot use cases where an NLP request can be translated as key-in commands for execution
  • NLP to key-in parsing shall be done using low end AI models running in webGPU or local (need not use larger models with PTU)
  • Would benefit automating workflows using key ins

@pmconne
Copy link
Member

pmconne commented Jan 27, 2025

Key-in commands are useful beyond LLM use cases; at one point early in the design of iTwin.js editing capabilities some folks were pushing for everything to be based on commands so that everything could be scripted. Today all Tools have a key-in as an entry point, but for most of them that just starts the tool which then reacts to often complex series of user inputs. Those user inputs (change tool setting value, enter data point, etc) would need to be scriptable as well. For an LLM to understand those interactions they would need more context than just the command description, like the prompts we display to users (e.g., when creating a line string, after first two points are entered we ask the user to subsequently enter more data points or click reset button to accept). Alternatively, the series of inputs would have to output a single command that captures the entire operation. Either way, developing tools this way is a big ask for app developers - it's more complicated and takes more time - so the benefits would have to be obvious and relatively immediate. (MicroStation's support for scripting similarly varies across modules; many tasks can only be accomplished through user interaction, because nothing strictly enforces otherwise).

@ColinKerr
Copy link
Member

@wgoehrig I believe there is overlap with this request and ideas for editing.

I think @karolis-zukauskas was also interested in editing APIs written as functions/library.

@jmoutte
Copy link

jmoutte commented Feb 2, 2025

It is certainly desirable to make our APIs easy to consume for LLMs to enable automation. I wonder if Key-ins are the right approach though. If you allow me to simplify a bit, a key-in is just another, higher level, API. The question then becomes: does the LLM really need that higher level API or can it work with the existing one?

Creating and more importantly, maintaining a new API is costly. We should always think very carefully before we introduce new APIs and measure the return on investment. In a world where LLM will become pervasive, I am quite convinced that the use of key-ins by humans will reduce in favor of interactions with LLMs that will do the automation for them more efficiently. Shouldn't we instead remove key-ins and make LLM proficient at using the lower level APIs ?

@ramanujam-raman
Copy link

ramanujam-raman commented Feb 2, 2025

A couple of related notes -

@rajkanben
Copy link

For data workflows, may be we can also think of leveraging GraphQL as an intermediate layer. GraphQL can complement LLMs as a tool for Agents to automate workflows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants