Skip to content

Commit

Permalink
feat: correct the captitalization (#33)
Browse files Browse the repository at this point in the history
* feat: correct the captitalization

* workflow(ci): fix lint error

---------

Co-authored-by: zhouxiao.shaw <[email protected]>
  • Loading branch information
yuyutaotao and zhoushaw authored Aug 5, 2024
1 parent 7ee0bdd commit 59081e3
Show file tree
Hide file tree
Showing 27 changed files with 82 additions and 82 deletions.
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ playwright-report/
blob-report/
playwright/.cache/

# MidScene.js dump files
# Midscene.js dump files
__ai_responses__/


Expand Down
8 changes: 4 additions & 4 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# MidScene Contribution Guide
# Midscene Contribution Guide

Thanks for that you are interested in contributing to MidScene. Before starting your contribution, please take a moment to read the following guidelines.
Thanks for that you are interested in contributing to Midscene. Before starting your contribution, please take a moment to read the following guidelines.

---

Expand Down Expand Up @@ -130,7 +130,7 @@ npx nx test @midscene/web

### Run E2E Tests

MidScene uses [playwright](https://github.com/microsoft/playwright) to run end-to-end tests.
Midscene uses [playwright](https://github.com/microsoft/playwright) to run end-to-end tests.

You can run the `e2e` command to run E2E tests:

Expand Down Expand Up @@ -193,7 +193,7 @@ feat(core): Add `myOption` config

## Versioning

All MidScene packages will use a fixed unified version.
All Midscene packages will use a fixed unified version.

The release notes are automatically generated by [GitHub releases](https://github.com/web-infra-dev/midscene/releases).

Expand Down
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
MIT License

Copyright (c) 2021-present MidScene.js
Copyright (c) 2021-present Midscene.js

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand Down
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
<p align="center">
<img alt="MidScene.js" width="260" src="https://github.com/user-attachments/assets/bff5e76f-ea5c-42b7-bd12-0143a04671cf">
<img alt="Midscene.js" width="260" src="https://github.com/user-attachments/assets/bff5e76f-ea5c-42b7-bd12-0143a04671cf">
</p>


<h1 align="center">MidScene.js</h1>
<h1 align="center">Midscene.js</h1>
<div align="center">

English | [简体中文](./README.zh.md)
Expand All @@ -20,11 +20,11 @@ English | [简体中文](./README.zh.md)
<img src="https://img.shields.io/badge/License-MIT-blue.svg?style=flat-square&color=00a8f0" alt="License" />
</p>

MidScene.js is an AI-powered automation SDK can control the page, perform assertions, and extract data in JSON format using natural language.
Midscene.js is an AI-powered automation SDK can control the page, perform assertions, and extract data in JSON format using natural language.

## Features ✨

- **Natural Language Interaction 👆**: Describe the steps and let MidScene plan and control the user interface for you
- **Natural Language Interaction 👆**: Describe the steps and let Midscene plan and control the user interface for you
- **Understand UI, Answer in JSON 🔍**: Provide prompts regarding the desired data format, and then receive the expected response in JSON format.
- **Intuitive Assertion 🤔**: Make assertions in natural language. It’s all based on AI understanding.
- **Out-of-box LLM 🪓**: It is fine to use public multimodal LLMs like GPT-4o. There is no need for any custom training.
Expand All @@ -40,4 +40,4 @@ MidScene.js is an AI-powered automation SDK can control the page, perform assert

## License

MidScene.js is [MIT licensed](https://github.com/web-infra-dev/midscene/blob/main/LICENSE).
Midscene.js is [MIT licensed](https://github.com/web-infra-dev/midscene/blob/main/LICENSE).
10 changes: 5 additions & 5 deletions README.zh.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
<p align="center">
<img alt="MidScene.js" width="260" src="https://github.com/user-attachments/assets/bff5e76f-ea5c-42b7-bd12-0143a04671cf">
<img alt="Midscene.js" width="260" src="https://github.com/user-attachments/assets/bff5e76f-ea5c-42b7-bd12-0143a04671cf">
</p>

<h1 align="center">MidScene.js</h1>
<h1 align="center">Midscene.js</h1>
<div align="center">

[English](./README.md) | 简体中文
Expand All @@ -20,11 +20,11 @@
</p>


MidScene.js 是一个由 AI 驱动的自动化 SDK,能够使用自然语言对网页进行操作、验证,并提取 JSON 格式的数据。
Midscene.js 是一个由 AI 驱动的自动化 SDK,能够使用自然语言对网页进行操作、验证,并提取 JSON 格式的数据。

## 特性 ✨

- **自然语言互动 👆**:只需描述你的步骤,MidScene 会为你规划和操作用户界面
- **自然语言互动 👆**:只需描述你的步骤,Midscene 会为你规划和操作用户界面
- **理解UI、JSON格式回答 🔍**:你可以提出关于数据格式的要求,然后得到 JSON 格式的预期回应。
- **直观断言 🤔**:用自然语言表达你的断言,AI 会理解并处理。
- **开箱即用的LLM 🪓**:使用公开的多模态大语言模型( 如GPT-4o ),无需任何定制训练。
Expand All @@ -40,4 +40,4 @@ MidScene.js 是一个由 AI 驱动的自动化 SDK,能够使用自然语言对

## 授权许可

MidScene.js 遵循 [MIT 许可协议](https://github.com/web-infra-dev/midscene/blob/main/LICENSE)
Midscene.js 遵循 [MIT 许可协议](https://github.com/web-infra-dev/midscene/blob/main/LICENSE)
6 changes: 3 additions & 3 deletions apps/site/docs/en/docs/getting-started/introduction.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@

UI automation can be frustrating, often involving a maze of *#ids*, *data-test-xxx* attributes, and *.selectors* that are difficult to maintain, especially when the page undergoes a refactor.

Introducing MidScene.js, an innovative SDK designed to bring joy back to programming by simplifying automation tasks.
Introducing Midscene.js, an innovative SDK designed to bring joy back to programming by simplifying automation tasks.

MidScene.js leverages a multimodal Large Language Model (LLM) to intuitively “understand” your user interface and carry out the necessary actions. You can simply describe the interaction steps or expected data formats, and the AI will handle the execution for you.
Midscene.js leverages a multimodal Large Language Model (LLM) to intuitively “understand” your user interface and carry out the necessary actions. You can simply describe the interaction steps or expected data formats, and the AI will handle the execution for you.

## Features

Expand Down Expand Up @@ -38,6 +38,6 @@ You may open the [Online Visualization Tool](/visualization/index.html) to see t

## Flow Chart

Here is a flowchart illustrating the core process of MidScene.
Here is a flowchart illustrating the core process of Midscene.

![](/flow.png)
4 changes: 2 additions & 2 deletions apps/site/docs/en/docs/getting-started/quick-start.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,7 @@ Promise.resolve(
await page.goto("https://www.ebay.com");
await sleep(5000);

// 👀 init MidScene agent
// 👀 init Midscene agent
const mid = new PuppeteerAgent(page);

// 👀 type keywords, perform a search
Expand Down Expand Up @@ -178,7 +178,7 @@ npx ts-node demo.ts

### Step 4. view test report after running

After running, MidScene will generate a log dump, which is placed in `./midscene_run/report/latest.web-dump.json` by default. Then put this file into [Visualization Tool](/visualization/), and you will have a clearer understanding of the process.
After running, Midscene will generate a log dump, which is placed in `./midscene_run/report/latest.web-dump.json` by default. Then put this file into [Visualization Tool](/visualization/), and you will have a clearer understanding of the process.


## View demo report
Expand Down
16 changes: 8 additions & 8 deletions apps/site/docs/en/docs/more/faq.md
Original file line number Diff line number Diff line change
@@ -1,29 +1,29 @@
# FAQ

### Can MidScene smartly plan the actions according to my one-line goal? Like executing "Tweet 'hello world'"
### Can Midscene smartly plan the actions according to my one-line goal? Like executing "Tweet 'hello world'"

MidScene is an automation assistance SDK with a key feature of action stability — ensuring the same actions are performed in each run. To maintain this stability, we encourage you to provide detailed instructions to help the AI understand each step of your task.
Midscene is an automation assistance SDK with a key feature of action stability — ensuring the same actions are performed in each run. To maintain this stability, we encourage you to provide detailed instructions to help the AI understand each step of your task.

If you require a 'goal-to-task' AI planning tool, you can develop one based on MidScene.
If you require a 'goal-to-task' AI planning tool, you can develop one based on Midscene.

Related Docs:
* [Tips for Prompting](./prompting-tips.html)

### Limitations

There are some limitations with MidScene. We are still working on them.
There are some limitations with Midscene. We are still working on them.

1. The interaction types are limited to only tap, type, keyboard press, and scroll.
2. It's not 100% stable. Even GPT-4o can't return the right answer all the time. Following the [Prompting Tips](./prompting-tips) will help improve stability.
3. Since we use JavaScript to retrieve items from the page, the elements inside the iframe cannot be accessed.

### Which LLM should I choose ?

MidScene needs a multimodal Large Language Model (LLM) to understand the UI. Currently, we find that OpenAI's GPT-4o performs much better than others.
Midscene needs a multimodal Large Language Model (LLM) to understand the UI. Currently, we find that OpenAI's GPT-4o performs much better than others.

### About the token cost

Image resolution and element numbers (i.e., a UI context size created by MidScene) will affect the token bill.
Image resolution and element numbers (i.e., a UI context size created by Midscene) will affect the token bill.

Here are some typical data with GPT-4o.

Expand All @@ -37,8 +37,8 @@ Here are some typical data with GPT-4o.
### The automation process is running more slowly than it did before

Since MidScene.js invokes AI for each planning and querying operation, the running time may increase by a factor of 3 to 10 compared to traditional Playwright scripts, for instance from 5 seconds to 20 seconds. This is currently inevitable but may improve with advancements in LLMs.
Since Midscene.js invokes AI for each planning and querying operation, the running time may increase by a factor of 3 to 10 compared to traditional Playwright scripts, for instance from 5 seconds to 20 seconds. This is currently inevitable but may improve with advancements in LLMs.

Despite the increased time and cost, MidScene stands out in practical applications due to its unique development experience and easy-to-maintain codebase. We are confident that incorporating automation scripts powered by MidScene will significantly enhance your project’s efficiency, cover many more situations, and boost overall productivity.
Despite the increased time and cost, Midscene stands out in practical applications due to its unique development experience and easy-to-maintain codebase. We are confident that incorporating automation scripts powered by Midscene will significantly enhance your project’s efficiency, cover many more situations, and boost overall productivity.

In short, it is worth the time and cost.
4 changes: 2 additions & 2 deletions apps/site/docs/en/docs/more/prompting-tips.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Tips for Prompting

The natural language parameter passed to MidScene will be part of the prompt sent to the LLM. There are certain techniques in prompt engineering that can help improve the understanding of user interfaces.
The natural language parameter passed to Midscene will be part of the prompt sent to the LLM. There are certain techniques in prompt engineering that can help improve the understanding of user interfaces.

### The purpose of optimization is to get a stable response from AI

Expand Down Expand Up @@ -28,7 +28,7 @@ Bad ❌: "[number, number], the [x, y] coords of the main button"

### Use visualization tool to debug

Use the visualization tool to debug and understand each step of MidScene. Just upload the log, and view the AI's parse results. You can find [the tool](/visualization/) on the navigation bar on this site.
Use the visualization tool to debug and understand each step of Midscene. Just upload the log, and view the AI's parse results. You can find [the tool](/visualization/) on the navigation bar on this site.

### Remember to cross-check the result by assertion

Expand Down
18 changes: 9 additions & 9 deletions apps/site/docs/en/docs/usage/API.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

## config AI vendor

MidScene uses the OpenAI SDK as the default AI service. You can customize the configuration using environment variables.
Midscene uses the OpenAI SDK as the default AI service. You can customize the configuration using environment variables.

There are the main configs, in which `OPENAI_API_KEY` is required.

Expand Down Expand Up @@ -50,7 +50,7 @@ You can view the integration sample in [quick-start](../getting-started/quick-st
### `.aiAction(steps: string)` or `.ai(steps: string)` - Control the page

You can use `.aiAction` to perform a series of actions. It accepts a `steps: string` as a parameter, which describes the actions. In the prompt, you should clearly describe the steps. MidScene will take care of the rest.
You can use `.aiAction` to perform a series of actions. It accepts a `steps: string` as a parameter, which describes the actions. In the prompt, you should clearly describe the steps. Midscene will take care of the rest.

`.ai` is the shortcut for `.aiAction`.

Expand All @@ -66,18 +66,18 @@ await mid.ai('Click the "completed" status button below the task list');

Steps should always be clearly and thoroughly described. A very brief prompt like 'Tweet "Hello World"' will result in unstable performance and a high likelihood of failure.

Under the hood, MidScene will plan the detailed steps by sending your page context and a screenshot to the AI. After that, MidScene will execute the steps one by one. If MidScene deems it impossible to execute, an error will be thrown.
Under the hood, Midscene will plan the detailed steps by sending your page context and a screenshot to the AI. After that, Midscene will execute the steps one by one. If Midscene deems it impossible to execute, an error will be thrown.

The main capabilities of MidScene are as follows, and your task will be split into these types. You can see them in the visualization tools:
The main capabilities of Midscene are as follows, and your task will be split into these types. You can see them in the visualization tools:

1. **Locator**: Identify the target element using a natural language description
2. **Action**: Tap, scroll, keyboard input, hover
3. **Others**: Sleep

Currently, MidScene can't plan steps that include conditions and loops.
Currently, Midscene can't plan steps that include conditions and loops.

Related Docs:
* [FAQ: Can MidScene smartly plan the actions according to my one-line goal? Like executing "Tweet 'hello world'](../more/faq.html)
* [FAQ: Can Midscene smartly plan the actions according to my one-line goal? Like executing "Tweet 'hello world'](../more/faq.html)
* [Tips for Prompting](../more/prompting-tips.html)

### `.aiQuery(dataDemand: any)` - extract any data from page
Expand Down Expand Up @@ -107,9 +107,9 @@ const dataC = await mid.aiQuery('{name: string, age: string}[], Data Record in t

### `.aiAssert(conditionPrompt: string, errorMsg?: string)` - do an assertion

This method will soon be available in MidScene.
This method will soon be available in Midscene.

`.aiAssert` works just like the normal `assert` method, except that the condition is a prompt string written in natural language. MidScene will call AI to determine if the `conditionPrompt` is true. If not, a detailed reason will be concatenated to the `errorMsg`.
`.aiAssert` works just like the normal `assert` method, except that the condition is a prompt string written in natural language. Midscene will call AI to determine if the `conditionPrompt` is true. If not, a detailed reason will be concatenated to the `errorMsg`.

```typescript
// coming soon
Expand All @@ -132,7 +132,7 @@ export LANGCHAIN_API_KEY="your_key_here"
export LANGCHAIN_PROJECT="your_project_name_here"
```

Launch MidScene, you should see logs like this:
Launch Midscene, you should see logs like this:

```log
DEBUGGING MODE: langsmith wrapper enabled
Expand Down
6 changes: 3 additions & 3 deletions apps/site/docs/en/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
pageType: home

hero:
name: MidScene.js
name: Midscene.js
text: |
Powered by AI
Joyful UI Automation
Expand All @@ -16,10 +16,10 @@ hero:
link: /docs/getting-started/quick-start
image:
src: /midscene.png
alt: MidScene Logo
alt: Midscene Logo
features:
- title: Natural Language Interaction
details: Describe the steps and let MidScene plan and control the user interface for you
details: Describe the steps and let Midscene plan and control the user interface for you
icon: 👆
- title: Understand UI, Answer in JSON
details: Provide prompts regarding the desired data format, and then receive the expected response in JSON format.
Expand Down
6 changes: 3 additions & 3 deletions apps/site/docs/zh/docs/getting-started/introduction.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@

UI 自动化太难写了。自动化脚本里到处都是选择器,比如 `#ids``data-test-xxx``.selectors`。在页面重构的时候,维护自动化脚本更将会是一场灾难。

我们在这里推出 MidScene.js,助你重拾编码的乐趣。
我们在这里推出 Midscene.js,助你重拾编码的乐趣。

MidScene.js 采用了多模态大语言模型(LLM),能够直观地“理解”你的用户界面并执行必要的操作。你只需描述交互步骤或期望的数据格式,AI 就能为你完成任务。
Midscene.js 采用了多模态大语言模型(LLM),能够直观地“理解”你的用户界面并执行必要的操作。你只需描述交互步骤或期望的数据格式,AI 就能为你完成任务。

# 特性

Expand Down Expand Up @@ -48,6 +48,6 @@ const dataB = await agent.aiQuery('string[], 任务列表中的任务名');

## 流程图

下图展示了 MidScene 的核心流程。
下图展示了 Midscene 的核心流程。

![](/flow.png)
4 changes: 2 additions & 2 deletions apps/site/docs/zh/docs/getting-started/quick-start.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -131,7 +131,7 @@ Promise.resolve(
await page.goto("https://www.ebay.com");
await sleep(5000);

// 👀 初始化 MidScene agent
// 👀 初始化 Midscene agent
const mid = new PuppeteerAgent(page);

// 👀 执行搜索
Expand Down Expand Up @@ -185,7 +185,7 @@ npx ts-node demo.ts

### 第四步:查看运行报告

运行 MidScene 之后,系统会生成一个日志文件,默认存放在 `./midscene_run/report/latest.web-dump.json`。然后,你可以把这个文件导入 [可视化工具](/visualization/),这样你就能更清楚地了解整个过程。
运行 Midscene 之后,系统会生成一个日志文件,默认存放在 `./midscene_run/report/latest.web-dump.json`。然后,你可以把这个文件导入 [可视化工具](/visualization/),这样你就能更清楚地了解整个过程。

## 访问示例报告

Expand Down
Loading

0 comments on commit 59081e3

Please sign in to comment.