I'd rather write programs to write programs than write programs.
— Dick Sites, May 1974
Building curlconverter requires Node.js 16+ and Emscripten 3.
On Ubuntu 22.10 you can install Node.js 18, npm and Emscripten 3.1.6 with
sudo apt update
sudo apt install nodejs npm emscripten
On older Ubuntu versions, follow these instructions for installing Node.js.
On macOS, install them with Homebrew
brew install node emscripten
If you add a new generator, you'll need to
- add it to README.md to the first sentence and to the CLI documentation
- export it in index.ts
- add it to cli.ts to the imports, the list of args and to the
--help
message - add it to test-utils.ts (to make it part of the testing)
- generate tests with
npm run gen-test -- --all --language <language>
- add it to tools/compare-requests.ts, if possible
- generate tests with
To add a new test
- create a file containing the curl command in test/fixtures/curl_commands/ with a descriptive filename like
post_with_headers.sh
- run
npm run gen-test post_with_headers
to save the result of converting that file to test/fixtures/<language>/ with a matching filename but different file extension likepost_with_headers.py
- modify the generator and re-run
npm run gen-test
until your test is converted correctly
- modify the generator and re-run
- run
npm test
to make sure all the old tests still pass
You can run a specific test with:
npm test -- --test post_with_data_binary
where post_with_data_binary
is the name of a file in test/fixtures/curl_commands/. Any path or file extension in the test name argument is ignored so npm test -- --test some/random/path/post_with_data_binary.py
is fine as long as test/fixtures/curl_commands/post_with_data_binary.sh exists.
You can run only the tests for a specific language generator with:
npm test -- --language python
First check which characters the input is made up of with https://verhovs.ky/text-inspector/ or xxd
. It might contain non-breaking spaces for example.
Next, check how the Bash is parsed on the tree-sitter playground, but keep in mind that the playground might run a different version of tree-sitter-bash.
curlconverter supports two types of input, either a string of Bash code:
toPython('curl example.com')
(used on curlconverter.com)echo curl example.com | curlconverter -
or an array of arguments:
toPython(['curl', 'example.com'])
curlconverter example.com
There are 6 steps. When the input is a string, we:
- Parse Bash code into an abstract syntax tree (AST) using tree-sitter-bash (an imperfect approximation of the real Bash grammar)
- Find the curl commands in the AST
- Convert the command AST nodes into an array of
Word
objects (shell tokens), doing things like removing quotes around strings. If the command redirects stdin, we also keep track of that
curl --data $VAR "example.com"
->
program [0, 0] - [2, 0]
command [0, 0] - [0, 30]
name: command_name [0, 0] - [0, 4]
word [0, 0] - [0, 4]
argument: word [0, 5] - [0, 11]
argument: simple_expansion [0, 12] - [0, 16]
variable_name [0, 13] - [0, 16]
argument: string [0, 17] - [0, 30]
->
['curl', '--data', <Shell variable>, 'example.com']
These first 3 steps are skipped if the input is already an array of strings.
- Iterate over the list of shell tokens and convert it into an object that mostly just maps argument names to either a boolean, a
Word
or list ofWord
s. A few arguments write to the same argument name.
->
[{
url: 'example.com',
data: <Shell variable>
}]
- Convert that object into a more advanced object, for example with the input URLs split into their parts: scheme/host/etc.
- Generate a string of code in the desired output language from that object
The entry point of the code is a bunch of toPython()
, toJavaScript()
, etc. functions. They store a list of which curl arguments are supported by the language they generate, and pass that into the parser so that we can report when the input command uses an argument that is ignored by the code generator.
They call parse()
which performs steps 1, 2 and 3 and then calls parseArgs()
, essentially a re-implementation of curl's parse_args()
that implements step 4.
curlconverter's command line interface is a drop-in replacement for the curl
command but adds two of its own arguments (--language <language>
selects the output language and -
/--stdin
reads the input command from stdin instead of from the arguments and raises an error if other arguments (except --verbose
) are passed), and repurposes curl's --verbose
argument to enable printing of warnings and JavaScript error stack traces during conversion.
First, Bash parses the curl command you type, expands the environment variables and passes an array of strings to the curl binary. Bash's parser is implemented using GNU Bison.
https://github.com/bminor/bash/blob/master/parse.y
curl is split into two parts, the C library (in lib/) and the command line interface to that C library (in src/). Like every C program, the array of arguments from the shell is passed into its main()
function
https://github.com/curl/curl/blob/curl-7_84_0/src/tool_main.c#L237
https://github.com/curl/curl/blob/curl-7_84_0/src/tool_operate.c#L2594
and is passed to parse_args()
, which is where curl does the first (and most important) step of interpreting the arguments
https://github.com/curl/curl/blob/curl-7_84_0/src/tool_getparam.c#L2401
Each argument is either a binary toggle or will consume a value after itself. Files are opened for reading and writing at this step as well.
The result is a linked list of structs. A curl command can query multiple URLs and each URL can contain "glob" expressions,
curl http://example.com/file[1-100:10].txt http://example.com/another-range/[1-10].png
will query 20 URLs. Each item in the linked list is a "URL" and it can have a corresponding output file handle and things like that. There is also a "global" struct that contains options which will apply to all URLs. These structs are defined in
https://github.com/curl/curl/blob/curl-7_84_0/src/tool_cfgable.h
https://github.com/curl/curl/blob/curl-7_84_0/src/tool_urlglob.h
https://github.com/curl/curl/blob/curl-7_84_0/src/tool_sdecls.h
Then it passes this linked list to run_all_transfers()
https://github.com/curl/curl/blob/curl-7_84_0/src/tool_operate.c#L2684
and ultimately the code ends up in single_transfer()
, which creates the object that the C library expects, mostly this just copies stuff from the struct to a new struct
https://github.com/curl/curl/blob/curl-7_84_0/src/tool_operate.c#L688
- https://www.gnu.org/savannah-checkouts/gnu/bash/manual/bash.html
- http://aosabook.org/en/bash.html
- http://mywiki.wooledge.org/BashParser
- Run Bash through VS Code's debugger: https://stackoverflow.com/a/75539741
- https://curl.se/docs/manpage.html
- https://curl.se/libcurl/c/libcurl-tutorial.html
- https://everything.curl.dev/
- Run curl through VS Code's debugger: https://stackoverflow.com/a/75523303
https://hg.mozilla.org/mozilla-central/file/tip/devtools/client/shared/curl.js
- jeayu (Java support)
- Muhammad Reza Irvanda (python env vars)
- Weslen Nascimento (Node fetch)
- Roman Druzki (Backlog scrubbing, parsing improvements)
- NoahCardoza (Command line interface)
- ssi-anik (JSON support)
- hrbrmstr (R support)
- daniellockard (Go support)
- eliask (improve python output)
- trdarr (devops and code style)
- nashe (fix PHP output)
- bfontaine (reduce code duplication in test suite)
- seadog007
- nicktimko
- wkalt
- nico202
- r3m0t
- csells (Dart support)
- yanshiyason (Elixir support)
- Robertof (Rust enhancements, correctness, es6)
- clintonc (Code quality / brevity, test suite consistency)
- MarkReeder (JSON formatting)
- cf512 (bugfixes and feature requests)
- DainisGorbunovs (MATLAB support)
- TennyZhuang (data-raw support)
- scottsteinbeck (CFML support)
- CBaldemir (Java + jsoup support)