ulp-forth is a Forth interpreter and optimizing cross compiler for the
ESP32 ULP coprocessor. It has most of the Forth 2020 standard
implemented. Code is interpreted on the host machine, when
that is finished it compiles the main
word for execution by the
ULP.
This is a 16 bit Forth implementation. It is case insensitive.
Several design choices were made to save ULP memory:
- The ULP cannot modify the dictionary or allocate memory.
- Only words used by
main
are cross compiled (pruning). - There are no error checks in the cross compiled output.
The Forth 2020 standard was followed for this implementation. There are some missing words, but all of the implemented words follow the standard and pass the test suite.
Copyright 2024-2025 Blake Felt [email protected]
- Installation
- Building ulp-forth
- Using ulp-forth
- Sharing memory
- Threading models
- Assembly words
- System words
- Clock words
- GPIO words
- Serial words
- I2C words
- Standard Core words
- Standard Core Extension words
- Standard Double words
- Optimizations
Releases can be found on the release page.
You can also build the latest tagged version from source with:
go install github.com/Molorius/ulp-forth@latest
The compiler can be built from source with
go build
Unit tests are run on the host and on a ulp emulator, they can be run with
go test ./...
Help running the program can be found with ulp-forth --help
.
The interpreter can be run with
ulp-forth run
This can be used for testing logic, but only runs on the host so
cannot be used for testing hardware. Type bye
to exit.
You can load files before the interpreter starts by including them in run the command.
ulp-forth run your_code.f
The cross compiler can be run with
ulp-forth build your_code.f
The user should pass in the list of files to be built, which will be interpreted in order before cross compiling the MAIN
word. So
ulp-forth build first.f second.f
will first interpret first.f, then second.f, then cross compile the MAIN
word.
-
--assembly
Output assembly that can be compiled by the main assemblers, set the --reserved flag before using. -
--custom_assembly
Output assembly only for use by ulp-asm, another project by this author. -
--output
Name of the output file. -
--reserved
Number of reserved bytes for the ULP, for use with --assembly flag (default 8176). Note that the Espressif linker has a bug so has 12 less total bytes. Any space not used by code or data is used for the stacks. -
--subroutine
Use the subroutine threading model, see the threading models section. Faster but larger.
There are words that can be used to share memory with the esp32. When compiled with the --custom_assembly
or --assembly
flags, the output assembly will include the .global
directive for the associated memory. This memory will not be optimized away.
Shared word | Equivalent word |
---|---|
GLOBAL-VARIABLE |
VARIABLE |
GLOBAL-2VARIABLE |
2VARIABLE |
GLOBAL-ALLOCATE |
ALLOCATE |
Access should be done while holding the mutex, see the System words section.
Example:
global-variable example \ create a global variable named "example"
\ read an address while holding the mutex
: global@ ( address -- n )
mutex.take \ take ownership of the mutex
@ \ read the value at the address
mutex.give \ release the mutex
;
\ write to an address while holding the mutex
: global! ( n address -- )
mutex.take \ take ownership of the mutex
! \ write the value to the address
mutex.give \ release the mutex
;
\ get the value at "example"
: get-example ( -- )
example \ put the address of the memory onto the stack
global@ \ read it
;
\ set the value at "example"
: set-example ( n -- )
example \ put the address of the memory onto the stack
global! \ write to it
;
There are two threading models for the output ULP code. This is the forth definition of "threading" and is not the same as multithreading in other languages. It can be thought of as the execution environment.
Token threading is usually smaller and subroutine threading is usually faster, but this can vary based on the program and optimizations.
This uses a lightweight virtual machine to execute all forth words. This allows for some very compact code, but there is a speed penalty for the virtual machine.
Code using this is roughly 20% smaller than subroutine threaded code.
This can be enabled with the --subroutine
flag. It compiles all forth words into assembly subroutines. This is very fast while executing, but there is a size penalty.
Code using this is roughly 20% faster than token threaded code.
A few words are provided to make ULP assembly without extending the compiler.
ASSEMBLY ( objn objn-1 ... obj0 n "\<spaces\>name" -- )
Skip leading spaces. Parse name
delimited by a space. Create a
definition for name
that compiles to token threaded ULP assembly.
The assembly is the contents of the objects on the stack, with object count n
.
Objects can be strings or integers.
Note that the assembly is built with ulp-asm, a project by the same author as ulp-forth. It is slightly different than the Espressif or micropython compilers.
Words built with ASSEMBLY should not access the return stack.
Example:
c" move r0, "
0x10 \ include this number
c" \njump next"
3 \ we want to compile the 3 items on the stack
ASSEMBLY MY-EXAMPLE
That will create a word MY-EXAMPLE
which will output the
token threaded assembly:
move r0, 16
jump next
ASSEMBLY-SRT ( objn objn-1 ... obj0 n "\<spaces\>name" -- )
Skip leading spaces. Parse name
delimited by a space. Create a
definition for name
that compiles to subroutine threaded ULP assembly.
The assembly is the contents of the objects on the stack, with object count n
.
Objects can be strings or integers.
Note that the assembly is built with ulp-asm, a project by the same author as ulp-forth. It is slightly different than the Espressif or micropython compilers.
Words built with ASSEMBLY-SRT should not access the return stack. A return is automatically appended.
Example:
c" move r0, "
0x10 \ include this number
\
2 \ we want to compile the 2 items on the stack
ASSEMBLY-SRT MY-EXAMPLE
That will create a word MY-EXAMPLE
which will output the
subroutine threaded assembly:
move r0, 16
add r2, r2, 1
jump r2
ASSEMBLY-BOTH
(
objAn objAn-1 ... objA0 n
objBn objBn-1 ... objB0 m
"\<spaces\>name" --
)
Skip leading spaces. Parse name
delimited by a space. Create a
definition for name
that compiles to token threaded and subroutine
threaded ULP assembly.
The token threaded assembly is the contents of the objA
objects
on the stack, with object count n
.
The subroutine threaded assembly is the contents of the objB
objects
on the stack, with object count m
.
Objects can be strings or integers.
Note that the assembly is built with ulp-asm, a project by the same author as ulp-forth. It is slightly different than the Espressif or micropython compilers.
Words built with ASSEMBLY-BOTH should not access the return stack. A return is automatically appended to the subroutine threaded assembly.
Example:
\ token threaded
c" move r0, "
0x10 \ include this number
2 \ we want to compile the 2 items on the stack
\ subroutine threaded
c" move r1, "
0x11 \ include this number
2 \ we want to compile the 2 items on the stack
ASSEMBLY-BOTH MY-EXAMPLE
That will create a word MY-EXAMPLE
which will output the
token threaded assembly:
move r0, 16
jump next
and the subroutine threaded assembly:
move r1, 17
add r2, r2, 1
jump r2
READ_RTC_REG ( addr low width "\<spaces\>name" -- )
Skip leading spaces. Parse name
delimited by a space. Create an
assembly definition for name
that reads from RTC address addr
,
low bit low
, width width
; the result is placed on the stack.
WRITE_RTC_REG ( addr low width data "\<spaces\>name" -- )
Skip leading spaces. Parse name
delimited by a space. Create an
assembly definition for name
that writes to RTC address addr
,
low bit low
, width width
.
2WRITE_RTC_REG
( addr0 high0 low0 data0
addr1 high1 low1 data1 "\<spaces\>name" -- )
Skip leading spaces. Parse name
delimited by a space. Create an
assembly definition for name
that writes to RTC address addr0
followed by address addr1
.
System words only run on the ULP.
HALT ( -- )
Halt execution of the ULP. Execution will resume at the instruction immediately following the HALT on both token and subroutine threaded models.
MUTEX.TAKE ( -- )
Takes the software mutex. The example project includes esp32 code to use this but a better way to use it needs to be written.
MUTEX.GIVE ( -- )
Gives the software mutex. The example project includes esp32 code to use this but a better way to use it needs to be written.
Clock words only run on the ULP.
The ULP has access to the RTC_SLOW clock. ulp-forth also has some busy-wait words for delays. Note that the ULP runs on the RTC_FAST clock, which we cannot read.
RTC_CLOCK ( -- d )
Read the lower 32 bits of the rtc_slow clock.
RTC_CLOCK_DELAY ( d -- )
Delay for d rtc_slow ticks.
BUSY_DELAY ( n -- )
Delay n times in a tight assembly loop.
DELAY_MS ( n -- )
Delay n milliseconds. The accuracy of this is affected by temperature and is device dependent.
The ULP can access certain pins called RTC_GPIO. These are mapped to the GPIO as well. Words are defined to help interface with them.
Each of these words are written with the prefix GPIOn, where n is the pin number. There are words for both the RTC_GPIOn and GPIOn numbers. Words are only written for GPIO that the ULP can access, and if a pin doesn't support output then the output words aren't defined.
Below is a table of all pins accessible to the ULP. RTC_GPIO is the naming used by the RTC subsystem, GPIO is the naming used by the rest of the ESP32 documentation.
GPIO | RTC_GPIO | Notes |
---|---|---|
36 | 0 | Input only, no pullups or pulldowns. |
37 | 1 | Input only, no pullups or pulldowns. |
38 | 2 | Input only, no pullups or pulldowns. |
39 | 3 | Input only, no pullups or pulldowns. |
34 | 4 | Input only, no pullups or pulldowns. |
35 | 5 | Input only, no pullups or pulldowns. |
25 | 6 | |
26 | 7 | |
33 | 8 | |
32 | 9 | |
4 | 10 | |
0 | 11 | |
2 | 12 | |
15 | 13 | |
13 | 14 | |
12 | 15 | |
14 | 16 | |
27 | 17 |
Not all pins are tested.
GPIOn.ENABLE ( -- )
Enable the usage of this pin by the ULP.
GPIOn.OUTPUT_ENABLE ( -- )
Enable output on this pin.
GPIOn.OUTPUT_DISABLE ( -- )
Disable output on this pin.
GPIOn.INPUT_ENABLE ( -- )
Allow reading from this pin.
GPIOn.SET_HIGH ( -- )
Set this pin to high.
GPIOn.SET_LOW ( -- )
Set this pin to low.
GPIOn.SET ( n -- )
If n is 0, set this pin to low. Otherwise set to high.
GPIOn.GET ( -- n )
Get the value of this pin. 0 or 1.
GPIOn.PULLUP_ENABLE ( -- )
Enable the pullup resistors on this pin.
GPIOn.PULLUP_DISABLE ( -- )
Disable the pullup resistors on this pin.
GPIOn.PULLDOWN_ENABLE ( -- )
Enable the pulldown resistors on this pin.
GPIOn.PULLDOWN_DISABLE ( -- )
Disable the pulldown resistors on this pin.
GPIO_NUMBER_TO_RTC ( gpio_num -- rtc_gpio_num )
Convert the GPIO number to the corresponding RTC_GPIO number.
The ULP does not have a hardware serial. This is implemented in assembly.
SERIAL.WRITE_CREATE ( pin wait-time "\<spaces\>name" -- )
Skip leading spaces. Parse name
delimited by a space. Create an
assembly definition for name
that uses the pin
RTC_GPIO and
delay wait-time
to write to serial. See the "hello_world.f" example for setup.
For example, if you used the word SERIAL_TX
then it would create the definition:
SERIAL_TX ( c -- )
which outputs the character c
.
SERIAL.WRITE_9600_BAUD ( -- wait-time )
Returns the wait-time to achieve 9600 baud.
SERIAL.WRITE_115200_BAUD ( -- wait-time )
Returns the wait-time to achieve 115200 baud. This is the fastest standard baud rate available with the assembly algorithm used.
The ULP has hardware I2C but it is difficult to use and limited in its features. This is a forth software implementation. It has clock stretching and allows for an arbitrary number of devices, but is slower than hardware.
To use this, you need to implement the deferred words:
I2C.SDA_HIGH
I2C.SDA_LOW
I2C.SDA_GET
I2C.SCL_HIGH
I2C.SCL_LOW
I2C.SCL_GET
See the "util/i2c_scan.f" file for pin setup.
I2C.START ( -- )
Send a start condition on the bus.
I2C.START_READ ( address -- ack )
Send the start condition and send a read command to the address.
Returns TRUE
if acknowledged.
I2C.START_WRITE ( address -- ack )
Send the start condition and send a write command to the address.
Returns TRUE
if acknowledged.
I2C.WRITE ( n -- ack )
Send byte n
on the bus. Returns TRUE
if acknowledged.
I2C.READ ( -- n )
Read a byte n
from the bus. This does not respond with a ack/nack,
that should be done with I2C.ACK
or I2C.NACK
.
I2C.ACK ( -- )
Send an ack bit on the bus.
I2C.NACK ( -- )
Send a nack bit on the bus.
I2C.STOP ( -- )
Send a stop condition on the bus.
These are the the core words that are implemented. Missing words
may be implemented in the future, only a few such as HERE
cannot
be implemented because of the ulp-forth architecture.
Some of the core words are created with DEFER
so they can be
easily overwritten. These are noted below.
Words that can only run on the host are noted as well. Some words,
such as DO
, cannot be directly executed on the ULP but words
created with them can run on the ULP. These are not noted as it
depends on how you attempt to use them, ulp-forth will throw an
error if it cannot be cannot be cross compiled.
'
(
*
+
+!
+LOOP
,
- Can only run on host.
-
.
."
/
/MOD
- Deferred to
S/REM
, can defer toF/MOD
.
- Deferred to
0<
0=
1+
1-
2!
2*
2!
2/
2@
2DROP
2DUP
2OVER
2SWAP
:
- Can only run on host.
;
- Can only run on host.
<
=
>
>BODY
>R
?DUP
@
ABS
ALIGN
- Can only run on host.
ALIGNED
ALLOT
- Can only run on host.
AND
BASE
BEGIN
BL
C!
C,
- Can only run on host.
C@
CELL+
CELLS
CHAR
CHAR+
CHARS
CONSTANT
- Can only run on host.
COUNT
CR
CREATE
- Can only run on host.
DECIMAL
DEPTH
DO
DOES>
- Can only run on host.
DROP
DUP
ELSE
EMIT
- Deferred so a program can output to any interface.
EMIT
is used for all printing, such as.
.
- Deferred so a program can output to any interface.
EVALUATE
- Can only run on host.
EXECUTE
EXIT
FILL
FIND
- Can only run on host.
HERE
I
IF
IMMEDIATE
INVERT
J
LEAVE
LITERAL
LOOP
LSHIFT
MAX
MIN
MOD
MOVE
NEGATE
OR
OVER
POSTPONE
R>
R@
RECURSE
REPEAT
ROT
RSHIFT
S"
S>D
SPACE
SPACES
STATE
SWAP
THEN
TYPE
U.
U<
UNLOOP
UNTIL
VARIABLE
WHILE
WORD
- Can only run on host.
XOR
[
[']
[CHAR]
]
Missing words may be implemented in the future.
.(
0<>
0>
2>R
2R>
2R@
:NONAME
- Can only run on host.
<>
?DO
AGAIN
BUFFER:
C"
- (nonstandard) can be interpreted.
CASE
COMPILE,
DEFER
DEFER!
DEFER@
ENDCASE
ENDOF
ERASE
FALSE
HEX
IS
NIP
OF
PICK
ROLL
TO
TRUE
TUCK
U>
VALUE
WITHIN
[COMPILE]
\
Not all of the double words are currently implemented, but they all can be in the future.
2CONSTANT
2LITERAL
2VARIABLE
D+
D-
D0<
D<
D0=
D>S
DABS
DMAX
DMIN
DNEGATE
M+
2ROT
2VALUE
DU<
The cross compiler includes some optimizations. More may be added later.
- Inline deferred words. If a word is defined by
DEFER
but the deferred word cannot be changed by the cross compiled output, this will inline the word. Normally a deferred word will look like:address-containing-word @ EXECUTE EXIT
, this optimizes that to:word EXIT
. - Tail calls. Words may be defined in assembly or forth. If the final word before an
EXIT
(or end of definition) is a forth word, this will instead jump to it. For example, a word that is compiled as+ forth-word EXIT
will be optimized to+ jump(forth-word)
. Smaller in token threaded model, faster, saves a stack slot.
To be added later:
- Fallthrough forth words instead of tail call
- Assembly inlining (subroutine threaded only)
- Forth inlining
- Forth common sequence compression
- Flow control analysis
- Constant folding
- Peephole optimization