There’s quite a bit of new Forth code in the embello repository on GitHub, and more being added all the time - so it’ll become increasingly important that this code gets documented.

There are many tools to do this. To match the rest of JeeLabs, the preferred solution is a static site - that way access can be snappy, page overhead stays low, and local copies are easy to use.

At the same time, it’s also crucial that this information can be easily kept up to date and extended. If adding docs is tedious, it simply won’t happen…

The solution chosen for embello, is GitHub pages. Since the repository is already on GitHub, and since GitHub pages now supports documentation in a docs/ subdirectory (as opposed to a separate branch or repo), it has become trivial to maintain everything in one place.

Through the magic of a custom domain, and GitHub’s use of a CDN, the pages will be served by GitHub as well as many local mirrors, while being reachable through a very obvious url:

http://embello.jeelabs.org

The main page shows a brief overview and explanation of how the embello repository is structured, and contain two areas with reference information: 1) a list of hardware boards frequently used on the weblog, and 2) the Forth Library Documentaion, as shown here:

The list of entries is not very long yet, and the amount of information on each page is not very extensive either at this point, but it’s a start. The point is to get the basic mechanism rolled out so that the rest can grow organically in the coming weeks and months.

Anyone obtaining a git clone, ZIP snapshot, or TAR snapshot will also get a copy of the documentation in its current state. It’s all open source, and free for any use.

To avoid having to maintain information in two places, the API documentation is extracted from the actual Forth source files, which need to contain word definitions of the form:

: blah ( n1 n2 -- f )  \ brief description of blah

Variables and constants will also be extracted by this documentation tool, which can be found in the tools/docex/ area of embello. The result is a documentation site which is set up entirely in Markdown, offers snappy static page access, and should be easy to update and keep in sync with the actual code definitions. Missing and misspelled files & words will be caught by docex.

Another convenience, is that every git commit will automatically re-generate the site.

Creating (and maintaining!) documentation is a long-term task. Your help is most welcome:

Please point out errors and missing docs by submitting a new issue on GitHub. Each mistake and omission deserves a fix, and will be given attention - with lots of tender love and care.
You’re welcome to clone the embello repository for your own use, and to make changes or add new documentation, just like you would to get your own code added to the embello repository. The preferred mechanism to make this happen is to submit a Pull Request on GitHub.
For discussion or any comments / tips about this documentation area, please visit the forum.

This is a step to help streamline the process of documenting it all. Search is not implemented by this static documentation site, but once it has been indexed a bit, you should be able to use a search engine by adding “site:embello.jeelabs.org” to your search query.

Will embello prosper & flourish? Only time will tell. Will this help make it usable? Definitely.

The PCB panels are in! They are produced by PCBcart, and arranged as 10 x 3 units:

With a nice blue soldermask + gold plating, just like all the other official boards from JeeLabs.

And here’s a close-up of the metal stencil, which will be used to apply solder paste:

A first test has been successful, but further testing will be needed to make sure that everything works as intended: all the pins properly connected, no shorts, correct silkscreen labels, etc.

This is what the top looks like with the optional coin cell holder and UFL antenna connector:

If everything goes according to plan, the JeeNode Zero rev4 will be ready one week from now. The boards will be available and shipped from Digital Smarties UK.

All documentation and design files for the JNZ can be found on the new documentation site.

One of the features of the JeeNode Zero, is that it takes minimal effort to get started: hook it up via any USB serial interface, using any terminal emulator you like, and you’re all set to go.

Here’s a PL2303-based USB interface, connecting power (+5V/GND) and serial pins (RX/TX):

On the host side, you need a terminal emulator, such as Picocom or putty which can connect to a serial port and deal with lines coming back from the JNZ, ending in LF instead of CR+LF.

Picocom is available on MacOS (”brew install picocom”) and Linux (”sudo apt-get install picocom”). The trick is to add “--imap lfcrlf” when setting the baudrate:

picocom -b 115200 --imap lfcrlf /dev/your-usb-serial-device

It takes even less effort with Folie, which was designed specifically for use with Mecrisp Forth and which is available as an executable for several platforms, simply download and unzip. See the releases page on GitHub for details.

Whichever USB interface and terminal emulator you use, once connected and started up, you should be able to talk directly to the JeeNode Zero and see its interactive command prompt. No toolchain, no compiler, no uploads, nothing. Everything takes place on the JeeNode Zero.

The LED on the JNZ lights at power up, if Mecrisp and the default runtime code are present. The first thing to try would be to toggle the LED, and then repeat this a few times:

led iox! <enter>

Welcome to the JeeNode Zero and the world of 32-bit ARM with Mecrisp Forth!

We can make life easier by using a USB BUB or equivalent, which also controls the DTR pin:

This adds the ability to reset the JNZ from Folie, by simply hitting CTRL-C:

$ folie -r
Select the serial port:
1: /dev/cu.Bluetooth-Incoming-Port
2: /dev/cu.usbserial-A600dW4s
? 2
Enter '!help' for additional help, or ctrl-d to quit.
[connected to /dev/cu.usbserial-A600dW4s]
[...]
!reset
Mecrisp-Stellaris RA 2.3.3 with M0 core for STM32L053C8 by Matthias Koch
64 KB <jz4> 3B5E0728 ram/flash: 4960 21248 free ok.

It may not seem like much, but the JNZ does not have a built-in reset button, and resets are extremely common during Forth development (and nothing to be ashamed of, it’s a really convenient way to get control back) - CTRL-C without shifting your focus from keyboard-and-screen is in fact more convenient than a reset button.

For re-flashing, i.e. actual firmware uploading using the µC’s built-in ROM boot loader, you will need a BUB III (or a modified BUB II), which connects the RTS signal to pin 2 of the FTDI header (between GND and 5V). The good news is that you rarely need to reflash the JeeNode Zero - unless you lock yourself out of the command prompt, which is possible but uncommon.

At this point, it’s up to you what to do with the JNZ - perhaps read out some analog values?

Here is an example which uses the OLED and graphics driver that is pre-loaded in each JNZ:

As it so happens, the pinout of those very popular 128x64 OLED displays on eBay match the GND/VCC/SCL/SDA pins of the JNZ, which is why the OLED can be attached without even requiring any jumper wires. Just be sure to get the 4-pin I2C version of that OLED display.

We can verify that the OLED is detected at address $3C on the I2C bus by typing this:

i2c-init i2c. <enter>

Note the period at the end of “i2c.” - it’s part of the name. The output will be:

i2c-init i2c.
00:                         -- -- -- -- -- -- -- --
10: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
20: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
30: -- -- -- -- -- -- -- -- -- -- -- -- 3C -- -- --
40: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
50: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
60: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
70: -- -- -- -- -- -- -- --                         ok.

Showing that JeeLabs logo on the OLED is now a matter of entering this at the Forth prompt:

lcd-init show-logo <enter>

That’s it. Working! Not a very exciting result perhaps, but it’s an example of how having some drivers permanently present in the JNZ’s flash can turn it into a very simple interactively-programmable device. Coming up next: how to manage drivers and flash memory on a JNZ.

PS. Did you notice that the JeeNode Zero fits exactly in a mini breadboard? That’s intentional!

An application written in Mecrisp Forth consists of a number of different parts:

The Mecrisp kernel itself: this is 20 KB of Matthias Koch’s hand-craftedassembly code, turning a µC into a Forth compiler / engine, with over 300 pre-defined “words”.
Compiled code, stored in flash memory - these will always be present on power-up.
Compiled variables, buffers, and code, stored in RAM - variables will be reset to their initial values on power-up and after each reset, but any code in RAM will be gone.

As delivered, the JeeNode Zero rev4 comes with ≈ 23 KB of compiled Forth code pre-installed in flash memory, leaving ≈ 21 KB of flash for additional code, as you can see in the greeting:

Mecrisp-Stellaris RA 2.3.3 with M0 core for STM32L053C8 by Matthias Koch
64 KB <jz4> 3B5E0728 ram/flash: 4960 21248 free ok.

Forth compiles code to a “dictionary”, which is essentially a growing stack of word definitions. There is a dictionary in flash, and there’s a second one in RAM. Conceptually, words defined later override earlier definitions, and appear later in the dictionary. Words defined in RAM override words defined in flash, i.e. lookup starts in RAM, and continues in flash if not found.

It’s all very clean, but there is a small gotcha:

compiletoflash  ok.
: a ." flash!" ;  ok.
compiletoram  ok.
: a ." ram!" ; Redefine a.  ok.
compiletoflash  ok.
a flash! ok.
compiletoram  ok.
a ram! ok.
forgetram  ok.
a flash! ok.

So while compiling to flash, the words in RAM are not visible! The reason for this is that words in flash can’t refer to words in RAM, as these would be gone on the next power cycle or reset. Note that you can refer to code in RAM via variables, and run them from flash using execute.

Development is very different from development in C or C++, because in Forth everything happens on the µC itself. This leads to a different way of structuring code - it helps to make a clear distinction between: on the one hand drivers and other relatively stable code, and on the other hand code which is currently being written and debugged.

A really effective way to develop code in Mecrisp Forth is to load all stable code in flash, and to keep all work-in-progress code in RAM. That way, a simple reset always restores the µC to a clearly defined state - no matter how bad the bugs are and no matter how many hardware settings the new code might have messed up.

Which is exactly why the JeeNode comes with a fair amount of pre-installed code. You won’t have to install anything to try out the ADC, PWM, I2C, SPI, or the RF69 wireless radio driver - they are all available out of the box. Even basic OLED support and graphical and text display primitives plus a small font are pre-installed.

This burnt in code is split into three sections:

Always - this code really should hardly ever need to be replaced, and may in fact require special tricks to update (on the F103, this is the case for the USB console driver)
Board - this code implements drivers for the most important hardware peripherals, such as GPIO, ADC, PWM, I2C, and SPI - it also defines the LED constant (pin PA8 on a JNZ rev4)
Core - this part can easily be replaced, depending on what you need for a specific project - as pre-installed, it contains among others, drivers for the RFM69 radio module and the OLED

On a JNZ rev4, each section is defined by a source file, which in turn includes everything else:

jz4/always.fs currently only defines cornerstone, which is used to mark off each section
jz4/board.fs defines the pins assigned to LED + RFM69 and adds essential drivers from flib

jz4/core.fs is the most interesting part, and can easily be customised. Here is the main code:

<<<board>>>
compiletoflash
include ../flib/spi/rf69.fs
include ../flib/any/varint.fs
include ../flib/i2c/ssd1306.fs
include ../flib/mecrisp/graphics.fs
include ../flib/mecrisp/multi.fs
cornerstone <<<core>>>

The logic of the core.fs file is as follows:

it’s meant to be uploaded via Folie’s “!s core.fs” command and since it refers to other files by a relative path, this must be sent from a specific directory (more on that below)
<<<board>>> is a “cornerstone” defined as a last step in the boards.fs file and calling it erases all definitions from flash which have been defined afterboard.fs was installed
since normally core.fs is loaded right after board.fs, the result is that re-sending the core.fs file will erase itself and everything newer, and then save updated definitions
as a last step, a cornerstone called <<<core>>> is defined - by calling this later on, you can erase all definitions added afterwards, so this restores flash to a known state

Since it’s the last word defined in the above sections, you can type “<<<core>>>” whenever you want to reset flash memory to that “standard” state. (note that cornerstones always end with a software reset, so this also wipes out anything in RAM).

Cornerstones provide a nice mechanism to manage flash memory - they act as markers to erase all newer definitions after their own position in the dictionary. Note that cornerstones can only be used in flash memory. For clearing all RAM definitions, there is forgetram.

To install a customised version of core.fs on a JeeNode Zero:

Get a copy of the embello repository on GitHub, either as download from the home page or (preferably) as a git clone, which makes it much easier to track changes.
Go to the directory with the relevant files, i.e. “cd explore/1608-forth/jz4/” and make changes to core.fs - you may have to re-order includes in case of dependencies.
Launch Folie (add “-r” option when not using a SerPlus) and enter “!s core.fs” - you should see a number of messages, as flash gets erased and all the included files are sent.

Here is a transcript of the send process, omitting most of the “Erase” lines for brevity:

  ok.
!s core.fs
1> core.fs 3: <<<board>>>

Erase block at  00008700  from Flash
Erase block at  00008780  from Flash
Erase block at  00008800  from Flash
[...]
Finished. Reset 
Mecrisp-Stellaris RA 2.3.3 with M0 core for STM32L053C8 by Matthias Koch
64 KB <jz4> 3B5E0728 ram/flash: 6804 30976 free ok.
1> core.fs 4: cr compiletoflash
 ok.
1> core.fs 5: ( core start: ) 00008700  ok.
1> core.fs 13: ( core end, size: ) 0000ACA8 9640  ok.

The first effect is that the flash dictionary will be reset, releasing all the memory used by the previous version of core.fs and whatever might have been added to flash later on. The second effect is that a fresh core.fs configuration will be compiled and saved in its place.

By installing new drivers and dropping the ones you don’t need, each JeeNode Zero can be configured exactly as required, but keep in mind that even with a standard core.fs setup, modified drivers can also be loaded in RAM or added to flash, superseding prior definitions.

Redefining a word to supersede the previous version is common practice in Forth (you’ll get a “Redefine” message whenever this happens), but beware that older definitions will continue to refer to the original code (“early binding”). Here is an example to illustrate that behaviour:

: a 123 . ;  ok.
a 123  ok.
: b a a ;  ok.
b 123 123  ok.
: a 456 . ; Redefine a.  ok.
b 123 123  ok.
: b a a ; Redefine b.  ok.
b 456 456  ok.

If you keep this behaviour in mind, there’s usually less need to erase and reload code in flash. Instead, you can simply load that code again in RAM, or append it to flash. Once you run out of free memory, that’s of course a good reason to do a full erase/re-flash as described earlier.

These same approaches can be used for board.fs, but don’t forget to reload core.fs as well.

To try out RF communications, we need to go through a number of steps:

hook up two JeeNode Zero boards, so we can develop on both in parallel
work out the code needed for the receive and send nodes
lower the average power consumption of the send node
install the send code in flash so it can run unattended
add a coin cell, and turn the sender into a standalone unit
as a bonus, we’ll also update the node with a modified version

Let’s jump right in… by the end of this one article, you’ll have it all.

1. Preparing two JNZ’s

Here is a 2-node setup, one via a USB-BUB type board, the other via a HyTiny-based SerPlus:

By running two instances of Folie, each in their own terminal session, and making sure that we can connect and easily reset ‘em with Ctrl-C, everything is ready for some Forth development.

2. Setting up RX & TX nodes

On the receiving end, it can’t get any simpler than entering “rf-listen”.

For sending, the first test is to manually send one packet with “123 rf-txtest”.

The result is that you should see one packet come in for each one sent out:

TX node, typed in	RX node, printed out
	`rf-listen`
`123 rf-txtest`
	`RF69 21EB2ACA01FDA0803D03 313233`
`45678 rf-txtest`
	`RF69 21EB2AC401FD52803D05 3435363738`
`9 rf-txtest`
	`RF69 21EB2AC301FDC0803D01 39`

As you can see, the payload contains the ASCII representation of the number sent out, in hex.

Excellent! This shows that all the hardware involved is doing its job.

Next, let’s continuously send one packet per second, with an incrementing counter in it:

: blips 1000000 0 do i rf-txtest 1000 ms loop ;
rf-init 0 rf-power blips

It’s the same rf-txtest call, but now as part of a loop. The receiver output will be:

RF69 21EB2AC301FE60803D01 30
RF69 21EB2AC201FF42803D01 31
RF69 21EB2AC301FF4E803D01 32
RF69 21EB2AC301FF9A803D01 33
RF69 21EB2AC301FC92803D01 36

Not all packets are guaranteed to arrive. As you can see, we got packets 0, 1, 2, and 3, missed packets 4 and 5, and then got packet 6. That’s to be expected - it’s not that these sensitive radios are so unreliable, we’re simply placing them far too close together - even at minimal power, the radio waves from the transmitter will verload a receiver placed only a few cm away!

To quit the infinite send loop, we can press Ctrl-C to get the Forth prompt back. Since this resets the JNZ, all code in RAM will be lost.

3. Low-power transmissions

The transmitting node is now sending one packet every second. The rest of the time it’s just twiddling its thumbs and wasting energy. Let’s fix that:

: blips 1000000 0 do i rf-txtest rf-sleep stop1s loop ;
lptim-init rf-init 0 rf-power blips

As before, running this should produce new output once a second on the receiver end.

Actual power consumption measurements will have to wait for another time, but the average current draw is likely to be under 10 µA, some three orders of magnitude lower than before.

Which means we’re almost ready to run the transmit node off a coin cell - but there’s a snag: the blips code lives in RAM, so the send node won’t start when simply powering it up.

4. Unattended operation

For such unattended operation, we need to save the blips code in flash and add logic to start it up immediately after power-up (or reset). Appending to flash is a matter of telling Mecrisp:

compiletoflash
: blips 1000000 0 do i rf-txtest rf-sleep stop1s loop ;

Automatic startup is also easy: Mecrisp looks for an “init” word on startup, and launches it.

But it’s also a bit risky. What if there is a mistake in the code, or we want to change it later? Once a new init is installed, we lose the default command-line prompt! Even a hard reset won’t get it back, since it’ll simply launch that same init override again.

The solution comes in the form of a special unattended word, defined in board.fs. This is designed specifically for making init overrides safe:

compiletoflash
: init init unattended lptim-init rf-init 0 rf-power blips ;

Some notes of what’s happening, since it’s fairly critical to get this right:

we re-define the init word, overriding its original definition
first we have to call the original init, which is why it appears twice
then we call unattended, which will act as a safety escape hatch
lastly, all the code that is to be run on startup follows

The unattended escape hatch is that when connected to a serial port, anything after it will be skipped. Instead, this init definition will immediately return to the Forth command prompt.

If you run the above and reset the board, you will see that nothing happens. You still have to start everything up by manually entering “lptim-init rf-init 0 rf-power blips”.

But that’s exactly the point. Even when a node has been set up to automatically go live without serial FTDI connection, the development workflow remains unaffected.

5. Running off a coin cell

And now the real test - we take the JNZ out of its FTDI connector and insert a coin cell:

(see the red glow? more on that in a moment…)

And here’s some sample output printed by the receiver, while moving away a bit:

RF69 21EB2AC201009C803D02 3231
RF69 21EB2AB801FF34803D02 3232
RF69 21EB2AA3020026803D02 3233
RF69 21EB2AB101FDA6803D02 3234
RF69 21EB2AB40100FE803D02 3235
RF69 21EB2AC0010078803D02 3236
RF69 21EB2A9E02002A803D02 3237
RF69 21EB2A9F02002C803D02 3238
RF69 21EB2A9F02002A803D02 3239
RF69 21EB2A9D020012803D02 3330

It’s alive!

But… whoops, there’s a silly little mistake in there: the LED is still on, drawing some 2 mA. Leaving that LED on would drain the coin cell within a matter of days…

6. Updating the code

So as last step, here is how to fix the LED issue by installing an improved version:

Remove the coin cell (this is important, it cannot coexist with power from FTDI!)
Put the JNZ back in its FTDI adapter, and it’ll show the normal welcome message again.
Remove the code we added to flash by entering “<<<core>>>” to restore the flash memory to its previous state. This was explained in the previous article.

Improve the code, i.e. we can simply add a call to turn the LED off on startup:

compiletoflash
: blips 1000000 0 do i rf-txtest rf-sleep stop1s loop ;
: init init unattended led-off lptim-init rf-init 0 rf-power blips ;

Now disconnect from FTDI again, put the coin cell back in, and we’re done.

That’s all. With a mean current consumption in the 10 µA’s (i.e. sleep mode + transmit power use), this node should be able to run at least a year or two on a fresh coin cell. Piece of cake!

One of the requirements of the JeeNode Zero rev4, is that each one has to be tested and then end up with the proper software loaded onto it. The obvious way to do this is to connect each board over FTDI after assembly, but since the headers are not soldered on at this stage, some sort of temporary hookup trick s needed.

Meet Fiddy, a 3D-printed clip which turns 6 “pogo pins” into a clip that can easily hold a PCB:

(Note: this image was shamelessly copied from Thingiverse)

This design was created and most generously shared by Timothy Reese, and there’s a very nice tutorial on the AdaFruit website about how to produce the clip and set it all up.

Ok, that should solve the initial upload, since FTDI with DTR and RTS hooked up is sufficient to re-program any STM32 chip via its built-in ROM boot loader. But what about testing?

The test requirement for this initial batch of JNZ rev4’s was kept relatively simple: make sure FTDI & µC work, verify that the LED works (it’s so easy to mount it the wrong way around!), and verify that the RFM69 module works for both transmit and receive. The reasoning is that with a working Forth interpreter installed, any special problems with a board can be tracked down in the field, since it’s easy to enter a bit of Forth and toggle any I/O pin, for example.

And of course, as part of the assembly process, all boards are visually inspected for obvious faults, such as “tombstones” (i.e. a tiny SMD resistor or capacitor ending up vertical due to excessive capillary pull on one side during reflow) and solder bridges between pins.

For testing the radio module, a second unit is needed. The approach selected as quick solution for now, was to take a Blue Pill with an RFM69 radio added, which listens to packets on a non-standard frequency, and then echoes a packet back.

The code for this Test Echo Node is quite simple - here’s the what the first version looked like:

: echo
  870 rf.freq !  rf-init  0 rf-power
  begin
    rf-recv ?dup if rf-txtest then
  again ;

echo

We set the radio to 870 MHz, well away from the 868 MHz band, and we set its transmit power to minimal so that it’s unlikely to disrupt anything over a few meters away. Then we continuously listen for a packet, and when received, we send out a packet with the length of that received packet. The contents really doesn’t matter - anything will work as basic reply.

With this approach we can let the Blue Pill echo continuously, while each new JNZs-under-test attempts to get a packet out and then read back a reply. On reception, we know it works.

This is the test setup, as actually used for the first batch of JNZs:

One requirement was that the JNZ should be tested without antenna attached, since that’s how these units are to be sent out. This is not so hard, given the proximity of the echo node, but it did require a bit of twiddling of RFM69 registers. And bumping the TX power a notch.

The last puzzle to solve was perhaps the most interesting one: how do you run a test on a fresh board and then end up with only the final release firmware once the test succeeds?

Thats’s were the flexibility of Mecrisp helps: we can create an image with all the release code in it (always/board/core) and then append a test with its own override of the init word.

When flashed this way, it starts the test init (since it’s the last one defined in the dictionary), and the test will run. And now the trick: when the test completes successfully, all it has to do is execute “<<<core>>>”. This removes the test code from flash memory and resets the board.

Since the LED is turned on by default on the release image, seeing the LED turn on is a signal that the test completed successfully, and that the board can be removed and accepted.

The full test code can be found here, but an extract of the main logic is as follows:

: blip 
  870 rf.freq !  rf-init  8 rf-power
  begin
    i rf-txtest
    5000 0 do
      rf-recv ?dup if
        <<<core>>>  \ clears test code and does a s/w reset
      then
    loop
    1000 ms
  again ;

  : init init led-off ( unattended ) blip ;

The actual code has some more refinements, but this’ll give you an idea of the whole process: the test runs, and once successful, it wipes itself, leaving the µC’s flash in its final release state.

It all worked like a charm. Of the 147 boards built, one probably has a short in the PCB and had to be rejected. For the rest, there were some issues with the soldering of the special (sideways-mounted!) LED, and a handful of tombstones. Each of these was fairly easy to spot and repair. One radio was mounted the wrong way around (doh!), and one µC chip ended up with its pads shifted over by one - nothing a bit of patience and heat couldn’t cure. So all in all, we had over 99% success rate, with > 90% of the boards coming out perfect right away, and the rest easily reworked and fixed. The software test was essential in catching all “outliers”.

The one downside which quickly became apparent, is that a complete firmware upload takes about 22s per board (the test is nearly instant). On a larger batch, that could well become a bottleneck - but a fix is already in the making: a new test jig will make the process 4x as fast by using the L052’s built-in ROM SPI-upload. Driven by Forth - probably an extra JNZ, in fact.

As everyone knows, the later a bug shows up in production, the more trouble it is to fix…

A few weeks ago, a batch of fresh JeeNode Zero rev4 boards was assembled, in itself the result of quite a bit of experimentation, design work, writing code, trying out things, rearranging the pins - and all of that across a couple of iterations.

Revision 4 is intended to be very close to the “final” design of the JeeNode Zero, and the only way to figure out those last details is to get it out there and push it further.

After assembly, the Mecrisp Forth software was installed, along with a fair amount of code from the embello/explore/1608-forth/jz4 directory on GitHub. So far, so good, it all worked like a charm, the boards worked, the software did its thing, and the radio echo test also passed.

Ready to go, right? So last week, 140 boards were sent off to the UK, where Digital Smarties handles orders, fulfilment, and everything that comes with it.

For one order in the NL, it was much simpler to send it direct, so one unwitting guinea pig got the board a couple of days early …

… and immediately ran into a bug, right with his very first exploration!

One board worked fine, the other consistently failed. That’s a hardware issue, right? But we’d tested them all, and we pretty certain the hardware worked (that’s the point of tests after all!).

Then, just to try and understand it, a board was set up at JeeLabs, and … the same test failed.

It wasn’t consistent though, the failure seemed intermittent. And the odd bit: once it went away, it never came back. Hardware? Software? It made no sense!

A trivial test like 1 2 + . worked fine (printing 3), every single time. The board also always came up fine after power-up and after reset, every single time.

At this point, climbing up a wall became an attractive thought - how can you debug something, which works on some boards, not others, goes away (!) after a while, and shows that both the hardware and software are performing as expected just about always? It just made no sense.

(which is of course not uncommon with non-trivial bugs …)

The next step luckily started to shed some light. The original failure report was as follows:

adc-init  ok.
pa0 adc . 38  ok.
: forever begin cr pa0 adc . 1000 ms again ; forever 

Unhandled Interrupt 00000003 !

Stack: [0 ]  TOS: 0000002A  *>

Calltrace:
00000000 00005285 ( 00005256 + 0000002E ) ct-irq
[etc...]

Keep in mind that this code didn’t fail on every board, and that even on the same board it sometimes passed, and then never failed again, not even after a reset.

Could it be the power supply? The rev4 has a larger capacitor than rev3 (100 µF iso 22 µF). Nope, that’s not it: the datasheet says that the rise time of Vcc is allowed to be arbitrarily long.

Could it be the ADC, or the systick timer interrupt? Nope, even this tiny test failed:

: forever begin again ; forever

Ehm… sometimes. Getting the failure to happen took some effort. And then it became clear: only a power down of at least a few seconds would trigger the problem, and then in fact it turned out to be fully reproducible, at least on the same board!

Could it be all that embello code added to Mecrisp and stored in flash? After all, the init word override does a few things on each reset. This was ruled out by erasing all words added to flash. And that forever loop kept failing.

And now the crazy part - this code passed every single time:

: a ;
: forever begin again ; forever

Yet this failed:

: a ; a

Finally, a first hypothesis emerged for what might be happening: maybe the generated code was bad, and maybe it was only bad right after power up. But not all the time, else this would have become apparent a long time ago (Mecrisp Forth’s compiler is used all the time, by anyone typing word defintions into it!).

But how can this be? If the generated code was bad, wouldn’t that end up getting compiled into flash memory and lead to builds which would consistently fail?

The next clue came for the disassembler, which is available as an add-on in Mecrisp. By loading this into flash memory, we can disassemble the code before running it. And yep, something strange turned up:

Mecrisp-Stellaris RA 2.3.3 with M0 core for STM32L053C8 by Matthias Koch
: a ;  ok.
see a
20000454: AEAD  add r6 sp  #2B4
20000456: 2600  movs r6 #0
20000458: 43F6  mvns r6 r6
2000045A: E000  b 2000045E
2000045C: 2600  movs r6 #0
2000045E: 4770  bx lr
ok.
: a ; Redefine a.  ok.
see a
20000468: 4770  bx lr
ok.

The first definition of a generates different code from the second one. That can’t be right.

And now the puzzle falls into place:

if there is a problem with uninitialised RAM in Mecrisp, then that would only happen on some units, and it would not reappear once the init is done
a > 5s power down will reset RAM to the same state, whereas a brief power loss will not (bits tend to linger a bit in the same state after power loss)
the first word definition generates bad code, but once initialised, all is well

Time to get in touch with Matthias - Mecrisp’s author - who was just as surprised by this, and immediately dove in and started checking all the code paths into the code optimiser.

And within hours, he nailed it:

ifinit is called on power up (which is not the case on original Mecrisp’s, but always the case for the “spezial” mods used on all embello builds), then there is one path where an essentiall compiler init step is bypassed (the Mecrisp core is all assembly code)

After that, it was a matter of time before Matthias had a fix, the fix got tested, verifying that it handled the exact case we ran into, and… Mecrisp 2.3.5 was released, less than an hour later!

And now the race was on, as always with such “late” issues: generating the new image to flash onto a JeeNode Zero rev 4, making absolutely sure that this is “the fix and nothing but the fix”, and then collaborating with the UK shop to get an upload mechanism going, getting the code onto each unit, and verifying that the upload succeeded and that each units starts up properly.

The result is that no-one receiving JNZ’s from the UK shop will ever run into this!

It’s been an interesting experience. It’s also fantastic tribute to open source, with Matthias stepping in on extremely short notice, and exactly identifying and squashing this nasty bug.

And for once, the shipping delay from NL to UK has allowed us to beat the odds :)

This exploration is about connecting a rotary encoder switch for use as an infinitely adustable up/down controller. The basic idea is that there are two switches inside, which generate pulses. In the simple rotary knobs, these two switches are open in the “click” position, and closed in a very specific way in between click positions (also known as “detents”).

Note that this article isn’t about “here’s a library, plug it in and you’re done”, but a summary of the steps taken to make it work in Forth on a JeeNode Zero, gradually improving the code and making it do more things. In short: it’s about the journey!

The unit below has a common middle connection, with the switches on the left and right pins:

In Forth code, this can be represented as:

PA5 constant ENC-A
PA3 constant ENC-B
PA4 constant ENC-C  \ common

The trick is to tie the C pin to ground, and enable internal pull-ups on the A and B pins:

IMODE-HIGH ENC-A io-mode!
IMODE-HIGH ENC-B io-mode!
OMODE-PP   ENC-C io-mode!  ENC-C ioc!

And of course the first thing to try, is simply to verify that something is happening. So let’s write a small loop which reads out the A and B pins periodically and prints them out. In Mecrisp Forth, loops need to be inside a definition, which we’ll call it read-enc here:

: read-enc
  begin
    cr ENC-A io@ . ENC-B io@ .
    500 ms
  again ;

Now we can type read-enc to start the loop, and very slowly turn the knob, given that the switch changes only appear between the clicks:

-1 -1
-1 -1
-1 0
0 0
0 0
0 0
0 -1
-1 -1
-1 -1

Hey, look, there’s something going on! Looks like this hookup is working!

To understand the logic of this, we need to look at the description in the datasheet:

This particular encoder has 24 detents per full rotation (others may have 12), and we can see that there are four edges between each detent / click. Most importantly, the direction of the rotation determines the pattern:

when rotated clockwise, the A pin pulses low before the B pin
when rotated counter-clockwise, the B pin pulses low before the A pin

And that’s exactly the point of this encoding. Note also that only one pin changes at a time, no matter which way the knob is turned, or how fast. This is called Gray code, and it’s extremely useful in real-world signal processing, because it eliminates nasty non-deterministic timing problems (a bit like race conditions in software).

Enough theory. We need to turn these pulses into counter changes, because the real goal is to use the knob to increase or decrease a software counter when turned.

There are many different ways to do this, but let’s start as simple as possible: a continuous loop, checking the pin states, and a little lookup, based on previous and current values of the A and B pins. Here’s a modified version of read-enc, which uses some bit shifting tricks:

: read-enc
  %11  \ previous state, stays on the stack
  begin
    2 lshift  \ prev pins in bits 3 and 2
    ab-pins tuck  \ new pins, also save as previous for next cycle
    or  \ combines prev-a/prev-b/curr-a/curr-b into a 4-bit value

    \ process this 4-bit value and leave only prev state on stack
    case
      %0001 of -1 step endof
      %0010 of  1 step endof
      %0100 of  1 step endof
      %0111 of -1 step endof
      %1000 of -1 step endof
      %1011 of  1 step endof
      %1101 of  1 step endof
      %1110 of -1 step endof
    endcase
  again ;

On each iteration through the loop, we move the A & B bits 2 to the left, then put the new values of the A & B pins into bits 1 and 0, respectively. This is based on some extra code:

1000 variable counter

: ab-pins ( -- n )  \ read current A & B pin state as bits 1 and 0
  ENC-A io@ %10 and  ENC-B io@ %01 and  or ;

: step ( n -- )  counter +!  cr counter @ . ;

As Forth is strictly bottom up, counter, ab-pins, and step must be defined beforerun-enc.

The step word will increment or decrement the counter, and print it out. So now, read-enc will print changing counter values when the knob is rotated, and stay silent otherwise. We’ve just implemented the basics of a rotary switch decoder!

But that’s still a bit boring, isn’t it? Let’s hook up an OLED and display the counter:

This requires some extra code (and the flib/any/digits.fs font for the nice large bitmaps). Only the step word needs to be changed (the full code can be found on GitHub):

: step ( n -- )  counter +!  counter @ shownum ;

To start things up properly, we now need to type in a few more commands:

lcd-init clear display read-enc

And that’s it. Working code for each of these can be found in the rot1.fs, rot2.fs, and rot3.fs source files on GitHub. If you try this out, you’ll need to resend core.fs before the rot3.fs rotary encoder demo can be used, since it was modified to include the digits.fs bitmaps.

Unfortunately, there’s a problem when using the OLED: the knob appears to work, but only when turned very slowly. Normal turn rates seem to cause only erratic changes in the counter value. The reason for this is quite simple - it will be explained (and fixed) in the next article…

PS. Due to the way the pixels are mapped, the exact same code also works on 128x32 OLEDs:

(apologies for the low image quality: these were taken without flash, using a compact camera)

Before moving on to the topic of this article, let’s figure out the problem that came up in the previous one, where adding an OLED display made the rotary encoder readout unreliable.

The problem is caused by the OLED’s display update code, which takes some time:

lcd-init  ok.
: a micros 1234 shownum micros swap - . ;  ok.
a 48444  ok.

That’s about 48 milliseconds to update the display. Over half of this time is caused by the way digits are drawn on the display, which uses a very crude approach: pixel by pixel drawing (!).

As a result, our code is not tracking pin changes for 48 ms. This not only means it can miss some rotary encoder pulses, but also that it may see some “impossible” transitions from one A/B pin readout to the next. This will happen every time a pulse comes in, i.e. when it matters.

The solution is to track pulses with interrupts. We could for example check the pins really often, say every millisecond in the SysTick interrupt handler, but it’s going to be a lot more effective to set up “edge-triggered external interrupts” for pins A & B.

Setting up EXTI’s on an STM32 µC is quite involved, as there’s a lot that has to be configured just right: the NVIC has to handle two new interrupts and the EXTI peripheral has to generate those interrupts on each falling edge on PA3 and PA5. The code is in ex/exti.fson GitHub.

As a test here, two counters are set up and incremented on each falling edge of PA3 and PA5, respectively. We then periodically read out those two counters and print them:

0 variable count3
0 variable count5

[...]

: read-enc
begin
    cr count3 @ . count5 @ .
    500 ms
again ;

Let’s see what happens:

count-pulses read-enc
0 0
0 0
23 20
97 67
144 82
159 86
189 130
235 171

Yep, now turning the knob generates interrupts - lots of them, in fact, due to switch bounce!

With interrupts, slow OLED updates go away because the pulses can continue to be processed while the OLED code does its thing. It’ll take a little bit more work and code to turn all these interrupts into a quadrature decoder, but that task will be postponed until the next article.

Back to the main goal of this article: cutting the cord!

The goal is to turn this rotary encoder demo into a wireless setup: one JeeNode Zero with the encoder knob, sending its values to a second JeeNode Zero with the OLED mounted:

The sending side is very straightforward, now that this recent article has laid the groundwork:

: step ( n -- )  counter +!  7 <pkt counter @ +pkt pkt>rf ;

I.e. on every step change: send out a packet with format code 7 and the current counter. The way to start this up is:

!s ex/rot4.fs
read-enc

On the receiver end, we can start with the same rxtestv receiver code as before, printing out all the packets that come in:

!s ex/rot4.fs
rxtestv
7 1001
7 1002
7 1003
7 1004
7 1005

And sure enough, it works. Packets are coming in as the knob is rotated! Also worth noting, is that the missed-pulses problem we had with OLEDs is considerably reduced, because sending a wireless packet takes far less time than updating an OLED display.

As a last step, we need to pick up the incoming values and show them on the receiver’s OLED:

: rxtestv ( -- )
  rf-init lcd-init
  begin
    rf-recv ?dup if
      rf.buf 2+  swap 2-  var-init
      var> if drop then     \ ignore the format type
      var> if shownum then  \ show the payload on OLED
    then
  again ;

That’s it. All the code is in jz4/ex/rot5.fs on GitHub. On the receiver, with OLED, we do:

Mecrisp-Stellaris RA 2.3.5 with M0 core for STM32L053C8 by Matthias Koch
64 KB <jz4> 3B5E0728 ram/flash: 4960 18432 free ok.
!s ex/rot5.fs
rxtestv

Whereas on the sender node, with the rotary encoder attached, we do this:

Mecrisp-Stellaris RA 2.3.5 with M0 core for STM32L053C8 by Matthias Koch
64 KB <jz4> 3B5E0729 ram/flash: 4960 18432 free ok.
!s ex/rot5.fs
read-enc

Now every control knob tweak is sent wirelessly to the other node and shown on the OLED. The only wire remaining is for power (and FTDI, since the code hasn’t been saved in flash yet).

The final task will be to really cut the cord on the sender and make the rotary knob portable…

It’s all nice and well, but a JeeNode Zero which needs to remain tethered to a host to set it up after each reset is not very useful. Fortunately, this can be fixed using a simple recipe:

add these two lines at the start of the source file:
```
<<<core>>>
compiletoflash
```
add this line at the end, assuming that the main routine is called read-enc:
```
: init init unattended read-enc ;
```

That’s it. Now the source will first clear any definitions in flash after the <<<core>>> words, and then set up to compile to flash. As the last step, the special init word is redefined to call the core setup, and then our code.

In between sits the crucial unattended word which allows us to switch between development mode and unattended / untethered mode. The distinction is made on startup by checking whether the serial RX input pin on the FTDI header is floating. If so, read-enc will be started.

This way, we’ll always get back to a prompt when plugged in - allowing us to update the flash.

While still attached to FTDI, we simply type read.enc to start it up. So even when in flash and ready to be used in detached mode, development is straight-forward: 1) press Ctrl-C to regain control, 2) send new code with “!s ...”, and 3) enter “read-enc” to launch the new code.

So much for making the node work unattended. The above is sufficient to allow unplugging it from FTDI and inserting a coin cell (remember: never connect power from both FTDI and the coin cell at the same time!). A coin cell will work fine, but… it’ll it’ll be drained within days!

We really need to get power consumption down if this “rotary knob node” is to be used for a long time, and even more so if it is to remain in always-on mode.

The first step is to pick the low-hanging fruit to take care of the main “power hogs” (relatively speaking). This can be done with a single line added to the read-enc code:

led-off 2.1MHz only-msi 10 systick-hz

But first, let’s set up a way to measure power consumption:

That’s a µCurrent from EEVblog, which is very convenient for measuring low currents with a standard multimeter - in the back you can see the receiving node (running unmodified code).

The baseline measured with the code from jz4/ex/rot5.fs is 5 mA. That’s quite a lot for a CR2032 coin cell, nominally rated at just 200..230 mAh: only ≈ 40 hours of run time, ouch!

The above one-line change takes care of the two main consumers: LED and µC. The LED is simply turned off, saving ≈ 2 mA, and the µC is set to run at 2.1 MHz instead of its default 16 MHz. The result is a ten-fold reduction: 460 µA. The coin cell will now last for ≈ 18 days.

Still not good enough, but we can take the µC clock rate quite a bit lower by using this line:

led-off 2.1MHz only-msi 65KHz 10 systick-hz

Note that this slow 65 KHz clock rate can only be reached by first switching to the 2.1 MHz clock and that we also lower the SysTick timer rate to 10 Hz. The result: 45 µA, i.e. 6 months.

Here is the full sender-side code, which can also be found in jz4/ex/rot6.fs on GitHub:

<<<core>>>
compiletoflash

PA3 constant ENC-A
PA5 constant ENC-B
PA4 constant ENC-C  \ common

1000 variable counter

: ab-pins ( -- n )  \ read current A & B pin state as bits 1 and 0
  ENC-A io@ %10 and  ENC-B io@ %01 and  or ;

: step ( n -- )  counter +!  7 <pkt counter @ +pkt pkt>rf ;

: read-enc
  IMODE-HIGH ENC-A io-mode!
  IMODE-HIGH ENC-B io-mode!
  OMODE-PP   ENC-C io-mode!  ENC-C ioc!
  rf-init
  led-off 2.1MHz only-msi 65KHz 10 systick-hz

  %11  \ previous state, stays on the stack
  begin
    2 lshift  \ prev pins in bits 3 and 2
    ab-pins tuck  \ new pins, also save as previous for next cycle
    or  \ combines prev-a/prev-b/curr-a/curr-b into a 4-bit value

    \ process this 4-bit value and leave only prev state on stack
    case
      %0001 of -1 step endof
      %0010 of  1 step endof
      %0100 of  1 step endof
      %0111 of -1 step endof
      %1000 of -1 step endof
      %1011 of  1 step endof
      %1101 of  1 step endof
      %1110 of -1 step endof
    endcase
  again ;

: init init unattended read-enc ;

Not bad for a one-line change. Can we do better? First of all, the rotary encoder readout is starting to become erratic again at this very low clock rate. And second: having to insert a fresh coin cell every six months is really still a bit too often. Yes, we can fix both, stay tuned…

So far, the power consumption of the rotary encoder node has been optimised by taking the current draw from 5.0 mA to 45 µA - that’s an estimated coin cell battery life of 6 months.

Unfortunately, this is where diminishing returns start to kick in. To progress beyond this point requires rethinking the node’s logic. We need to put the µC into a really low-power mode, in such a way that it’ll still wake up every time that control dial is touched.

Another tricky aspect, is that once you start fiddling with clock rates and low-power sleep modes, it becomes very difficult to keep track of time. This makes it hard to determine when something happened.

After some contemplation, the following approach came to mind:

use edge-triggered interrupts to capture rotary encoder changes and update the counter
go to sleep for 100 milliseconds at a time, in an infinite loop
send out a packet only when no edges were seen in the last sleep period, and EITHER the counter has changed OR 10s (100x 100 ms) have passed since the last transmission

This will avoid sending out packets in rapid-fire succession (as the current version still does), and will keep the µC in low-power mode while not missing a beat when pulses come in on the PA3 and PA5 GPIO pins. Each edge-triggered interrupt will wake up the µC.

Let’s first find out what we get by taking a node into repetitive 100 ms low-power “stop” mode:

begin
  stop100ms
again ;

The result: 9 .. 11 µA (it varies a bit, probably some external effects).

Getting the quadrature pulse decoding working again requires implementing “edge-triggered interrupts”, as described earlier.

Time to try it out - the code is in jz4/ex/rot7.fs on GitHub. This is a first test, which sends the counter value every 10 seconds, and goes into low-power stop mode in between:

: read-enc
  [...]
  led-off 2.1MHz only-msi 1000 systick-hz lptim-init
  count-pulses
  begin
    7 <pkt counter @ +pkt pkt>rf rf-sleep
    stop10s
  again ;

Those 10 seconds in stop mode cause the display to update slowly, but do allow us to check the sleep current reading: it’s around 8 µA - while still responding to the rotary encoder pulses!

Note that this is a bit lower than the stop100ms loop above, because start/stop configuration takes a bit of processing power, and those cycles will also eat up some energy.

The final version of this “Rotary Encoder Node” is in jz4/ex/rot8.fs on GitHub. It draws about 9 µA when idle, captures all control knob changes, sends packets out at most 10 times per second, with a heartbeat send of the counter value every 10 seconds even if it hasn’t changed.

The main loop now looks like this:

0  \ keep previous counter value on the stack
begin
  idle @ if
    counter @ tuck <> if-send  \ send a packet if counter changed
  then
  1 idle +!
  idle @ 100 mod 0= if-send  \ send a heartbeat packet every 10 s
  stop100ms
again

The average current could be reduced further by using longer sleep cycles, since this code still wakes up 10 times per second. This will make the main loop a bit more complicated, however.

Look ma, no hands! (yes… OLEDs always come out strange with short flash shutter times)

Many refinements are possible, for example: including the node’s hardware ID in the payload to support multiple nodes running at the same time, and adding the estimated battery voltage to give an indication of when it might be time to replace the coin cell.

But all in all, this has accomplished its goal: a JeeNode Zero which needs no on-off switch because it’s expected to run well over 2 years before its coin cell needs to be replaced. Not bad for under 70 lines of Forth code (and of course several more from the generic “flib“ library).

It turns out that a ROM-based serial upload with Folie takes about 22 seconds for a standard Embello install (Mecrisp + always/board/core). While this is fine for occasional re-flashing, it adds quite a delay for production, i.e. when an image needs to be loaded onto each new board.

But the STM32L05x series µCs has another trick up its sleeve:

The ROM boot loader also checks for SPI requests on pins PA4..PA7, and on the JeeNode Zero rev4, all these pins are available of the main header. Could we bypass serial and use SPI?

Let’s find out, using a Blue Pill as programmer, since it has two SPI buses (this might come in handy later, with a radio on the other bus, for example).

The “quick loader” code for this is a bit involved, see 1608-forth/qld/dev.fs on GitHub.

We need the following pins connected on a JNZ:

power, i.e. +5V and ground
SPI, i.e. PA4 .. PA7
RESET and BOOT0 on the FTDI header

The latter are needed to start ROM boot mode: keep BOOT0 high, while pulsing RESET low.

This test does a “fake” upload, in that it goes through all the steps, but sends dummy data:

: uploader ( -- f )
  boot-mode
  sof check-ack
  get-cmd hex . decimal
  get-id hex . decimal
  rd-unp
  wr-unp
  512 erase  \ erase all 64 KB ...
  320 0 do   \ ... but program only 40 KB
    0 i 128 * $08000000 + pgm
  loop ;

And to run it, we can run as “boot-init uploader”. The uploader will also fetch and print the boot loader version and µC type ($417). Measuring the time it takes, we get ≈ 5 seconds.

Those 5 seconds are not so bad: according to the datasheet, both erasing and programming one 128-byte flash page takes 3.3..4.0 ms. Worst case, this is: 2 x 512 pages x 4 ms/page = 4 s.

Now we can work out Folie’s inefficiency for this operation: it inserts some delays and going through a 115200 baud link slows down the process a bit, adding ≈ 17 seconds of overhead.

There is one more way to perform fast uploads: through JTAG/SWD. This too bypasses the serial port and needs just a few pins (SWCLK, SWDIO, RESET, and power). Stay tuned…

The Raspberry Pi does not really need an introduction: Linux plus some tinkering pins - who could possibly ask for more? It has all the features needed to create a flexible and powerful programming / debugging tool for microcontrollers. It can even run entire toolchains for cross-compiling for a wide variety of these µCs.

One tricky aspect, which is probably the main stumbling block if you’ve ever looked into trying out things with a 32-bit ARM-based µC, is how to get the software onto those chips. There are many ways, well-researched on this weblog and elsewhere, but they all have little quirks - from trouble with connecting everything together, to not being portable across Win/Mac/Lin, to requiring a special programmer - it quickly turns into a chicken-and-egg kind of adventure:

YOU ARE IN A MAZE OF TWISTY LITTLE PASSAGES, ALL ALIKE

Here is yet another setup, which requires nothing other than a working Raspberry Pi (to be called “RasPi” from now on) - of which there are millions by now. Best of all, it’ll work with any RasPi model (and probably most compatible alternatives), no matter how old or limited:

We’ll use nine pins on the RasPi’s header - all in the first 26 pins, i.e. present on all models:

Function	RasPi name	Header Pin
+3.3V power	-	1
+5V power	-	2, 4
Ground	-	6, 9, 14, 25
Serial out	TX	8
Serial in	RX	10
µC RESET	GPIO 18	12
µC BOOT0	GPIO 23	16
µC SWDIO	GPIO 24	18
µC SWCLK	GPIO 25	22

Here is an example how these pins could be wired to a few female headers:

Below is a fully self-contained unit, hacked together from parts lying around here at JeeLabs. The tape keeps some unused wires from the 26-core flat cable out of harm’s way:

There’s a small USB power bank (based on the very common 18650 LiPo cell) and in this case also a WiFi dongle, plugged into the only USB port available on this older RasPi 1, model A. Any SD card of at least 2 GB will do, and it’s all meant to be used via SSH, i.e. via the cmd-line.

In the above setup, you can also see a 470 µF capacitor between +5V and GND, and another one of 47 µF between +3.3V and GND, because old RasPi’s are sensitive to power fluctuations. Without them, plugging a µC board into any of the headers can trip it up and reset the RasPi.

For JeeNode Zero boards, with a FTDI pinout including DTR & RTS, we can plug in as follows:

To connect a JNZ for programming over SWD, this will work (note the extra RESET wire):

For a HyTiny F103 board, we need to use its dedicated header (again with extra RESET wire):

And lastly, the Blue Pill board has its own header (yes, it too needs the RESET wire):

The RESET wire is not always needed for programming over SWD. It depends on whether the code running on the µC has disabled SWD - when SWD is disabled, the only way to get control back is to connect RESET. Asserting this signal keeps the µC in reset during programming.

Coming next: setting up the software to turn this RasPi into a general-purpose µC tool…

To turn the Raspberry Pi into a general-purpose uploader / debugger for ARM STM32 chips, we need to set up some software.

First of all - the OS. DietPi is a very practical little distribution these days. It’s minimal, well-supported, offers a simple way to manage lots of popular applications and to configure or update the system itself. All console driven, through simple old-fashioned text menus.

Underneath this all sits a standard Debian 8.0 system, with apt-get and all that jazz.

Most conveniently, DietPi is available for a very wide range of RasPi-like boards:

So the first step is to dowload the proper image and put it on an SD card. For boards which do not have on-board wired LAN, it’s easy to get WiFi started by editing /boot/dietpi.cfgbefore ejecting the SD card from the host machine (that boot partion is VFAT and can also be mounted in Windows and MacOS).

If all is well, you can login on the box over the network, using ssh (or Putty for Win-users):

At this point, you’ll be taken to the DietPi “Launcher” (it can also be started manually later):

No need to install extra packages in this first pass through DietPi’s setup. The only confusing part is that even when not installing anything else, you’ll need to go through that last “Install” step in the menu to complete the setup process, and only then should you exit the Launcher.

The entire setup process is well-documented, see DietPi’s Getting Started page. Once you’re up and running, and logged in as root on the RasPi, you will get a greeting similar to this:

───────────────────────────────────────
 DietPi     | 13:30 | Mon 06/03/17
 ───────────────────────────────────────
 V145       | RPi A (armv6l)
 ───────────────────────────────────────
 IP Address | 192.168.188.55
 ───────────────────────────────────────

 Created by : Daniel Knight
 Web        : http://DietPi.com
 Twitter    : http://twitter.com/dietpi_
 Donate     : http://goo.gl/pzISt9
 DietPi's web hosting is powered by: MyVirtualServer.com

 dietpi-launcher  = All the DietPi programs in one place.
 dietpi-config    = Feature rich configuration tool for your device.
 dietpi-software  = Select optimized software for installation.
 htop             = Resource monitor.
 cpu              = Shows CPU information and stats.

From here on, we’ll always use the root account to avoid problems with permissions on GPIO pins, etc. It’s a dedicated board, so it’s no big deal. Just pick a good password and SSH keys.

First, we need to start up dietpi-config and adjust a few settings:

Advanced Options: - serial console disabled, i2c enabled, i2c frequency 400 (kHz)
Security Options: - adjust your root password and host name to your preference
Network Options: Adapters - optionally disable ipv6, if not used

Disabling the serial console does not seem to work, but this extra command will do the trick:

systemctl mask serial-getty@ttyAMA0.service

Time to bring everything up to date and install a few more packages:

apt-get update && apt-get upgrade
apt-get install aptitude i2c-tools stm32flash picocom vim

Now clean up a bit, reboot to start from a fresh power-up state - then log back in again:

apt-get clean
reboot

With these changes, the RasPi will start up with its serial console free for our own use (no getty running, no login prompt), and we can start using this for connecting µC boards - we now have serial, I2C, and SPI at our disposal on the RasPi header.

We can use PicoCom for example:

picocom -b 115200 -imod lfcrlf /dev/ttyAMA0

But we can also download Folie, unpack and move it to /usr/local/bin/, and use that:

folie -r -p /dev/ttyAMA0

If the DTR and RTS pins on the FTDI header have been wired up, we can use stm32flash (installed earlier) to verify their proper operation - here with a JeeNode Zero inserted:

# stm32flash -i 23,-18,18:-23,-18,18 /dev/ttyAMA0
stm32flash 0.4

http://stm32flash.googlecode.com/

Interface serial_posix: 57600 8E1
Version      : 0x31
Option 1     : 0x00
Option 2     : 0x00
Device ID    : 0x0417 (L05xxx/06xxx)
- RAM        : 8KiB  (4096b reserved by bootloader)
- Flash      : 64KiB (sector size: 32x128)
- Option RAM : 16b
- System RAM : 4KiB

Next, we’ll install OpenOCD. It’s particularly useful on RasPi, because it can toggle GPIO pins to create a JTAG/SWD programmer and debugger. It’s highly configurable and supports gdb.

The default OpenOCD package in Debian is version 0.8, but 0.10 is a better choice for SWD on RasPi - so we’ll build it ourselves from source. There’s an excellent description of all the steps involved. In summary, we need to enter the following commands - this will take a while:

apt-get install git autoconf libtool make pkg-config libusb-1.0-0-dev telnet
mkdir -p ~/src; cd ~/src
git clone git://git.code.sf.net/p/openocd/code openocd
cd openocd && ./bootstrap
./configure --enable-maintainer-mode --enable-bcm2835gpio --enable-sysfsgpio
make && make install   # use "make -j4" on RasPi ≥ 2 to speed things up

Once OpenOCD is installed, we can set up some scripts and configuration files to upload a hex firmware image easily to an attached microcontroller. Note that this is not limited to the STM32L052 of the JeeNode Zero or the STM32F103 µC series - OpenOCD is considerably more advanced and generalised than that.

Below are a number of scripts which you can add to the home directory, i.e. /root/ to create a basic upload structure. The hex firmware images are expected to be in folders named images-l0/ and images-f1x/, and the 2nd argument to burn.sh and burns.sh is the image name.

Assuming there’s a file called $HOME/images-l0/jz4.hex, this will upload it via SWD:

./burn.sh l0 jz4

And this variant can be used to upload over serial with DTR+RTS pin toggling:

./burns.sh l0 jz4

Best of all, if you have set up SSH access to your RasPi box, then all of this can be done without even logging in, using commands such as:

ssh myraspi ./burn.sh l0 jz4
ssh myraspi ./burns.sh l0 jz4

Below are the scripts which make this possible, shown inline but provided as a Gist by GitHub:

There you go - hook it all up, install the above software, and you’ll be ready for any µC task!

There’s been a multi-tasker hiding in the Embello repository for some time now. It’s a small variation of the one provided as part of the Mecrisp distribution, also on GitHub.

The multi-tasker lets us do multiple things at once, or more precisely: it can quickly switch between tasks, each with their own data and return stacks, and their own program counter. The result is that you can write a thread of code as if nothing else is of interest… as we’ll see.

But this is not quite the multi-tasking you might expect from an RTOS, which usually does everything based on interrupts from timers and other hardware peripherals. The multi-tasking traditionally used in Forth is cooperative, not pre-emptive: instead of using interrupts to force tasks to relinquish control, cooperative multi-tasking relies on tasks voluntarily passing control to other tasks from time to time. If everyone plays nice, everyone will get a fair deal.

This has some major implications:

cooperatve multi-tasking is much simpler (under 100 lines of Forth)
there is no risk of being interrupted, your code always runs to completion
the only time when a task loses control is when it calls pause
it is upon the programmer to make sure that this happens “often enough”
tasks are either “active” (running every so often) or “idle” (not running)
interrupts can be used to change the active/idle state of any task
new tasks can be dynamically added or removed from the list of all tasks
lastly, the multi-tasker can be started and stopped at any time

There’s a lot to take in when it comes to even just this simple form of multi-tasking, but as you will see, it’s very effective and an excellent match for Forth’s interactive nature. Let’s dive in!

Here’s a small example to blink an LED on pin PA12 as a background task:

PA12 constant LED1
OMODE-PP LED1 io-mode!

task: blinker

: blink& ( -- )
  blinker activate
  begin
    LED1 iox!   \ toggle LED1
    200 ms      \ wait 200 ms
  again ;

So far, nothing has been activated: this merely sets up the LED, defines a task called blinker, and defines a word which becomes the “handler” for this task when it’s running. So blinker is the task descriptor, while blink& is the actual task code and logic.

There is always one boot task, as we can see by entering the tasks command:

Task @ 20004AC0 Next: 20004AC0 State: FFFFFFFF Stack: 00000000 Handler: 00000000

To insert the blinker task into the task list, we need to call blink&, then tasks will show:

Task @ 20004AC0 Next: 20000BA8 State: FFFFFFFF Stack: 20000278 Handler: 00000000
Task @ 20000BA8 Next: 20004AC0 State: FFFFFFFF Stack: 20000CB8 Handler: 00000000

Note how the “Next” chain is a circular list. But… wait a minute… the LED is not blinking!

Oops, we need to enable multi-tasking: Forth is listening for commands and executing them, but multi-tasking dispatch has not been enabled yet. To start it up, type multitask - the LED starts blinking and our command prompt is still responsive. Forth is now dual-tasking!

To disable the multi-tasker, enter singletask. This is a very useful feature: when developing, you often need to stop all the background activity - especially if it’s generating its own output.

For a larger example, see g6s/ex/tasks.fs, which starts up 4 tasks in the background, each controlling its own LED and at a different rate.

There’s a stop word, which causes the current task to make itself inactive (idle). If you type stop at the command prompt, you’ll have deactivated the command prompt, but the LED(s) will keep on blinking. There are ways to get the prompt back, but it’s usually not a good idea.

Recall that this is collaborative multi-tasking, which only switches tasks when pause is called - so how come it’s switching tasks, even though there are no calls to pause in our code?

Well, the calls are in there, in a few places in the embello source code so far, in fact:

when calling “ms”, i.e. when requesting millisecond delays (but not in “us”)
when calling “key?” (or “key”, which calls key? internally)

Knowing exactly where these pause calls are made is essential if you need to stay in control in your code. The above two cases are fairly logical ones, since they both indicate that a task has nothing else to do, at least for a little while. Eventually, it’ll be given control again.

And that’s really the gist of it: with just the above very basic introduction to the Forth multi-tasker, you can start to write code which does a lot of things at “more or less” the same time. Each of the tasks can have loops and perform its work as if nothing else needs to happen - as long as you make sure that pause gets called regularly (within a few milliseconds is usually fine, but it really depends on the application).

Multi-tasking is a great mechanism, but there is a drawback: each task needs its own stack. In the case of Forth, it’s even worse because each task needs both a return stack and a data stack.

In its current configuration, multi.fs is set to 64 elements on each stack, which translates to a whopping 512 bytes of RAM needed (plus about 20 bytes for the task descriptors) - per task!

On an embedded µC, with 8 .. 20 KB of RAM, this really isn’t very convenient.

Stacks are needed when you want to write each task as if it ruled the world, so to speak: run in loops, and leave state on the data and return stacks while going through the logic of the code.

But there is another ways to get a lot of independent work done: callbacks. And while forcing callbacks onto an application (as NodeJS does) is not a good idea, there are nevertheless many cases where just some callbacks can easily handle things.

The difference between callbacks and tasks comes down to not leaving state on the stack: in a callback world, something triggers your code, it does its thing, and then it returns. If it needs to do more, it can schedule additional callbacks, usually via some sort of timer mechanism. The key here is the word “return”: callbacks cannot leave stuff on either stack, they need to manage all state between invocations elsewhere, i.e. in variables and buffers.

Meet the timed module and it’s documentation, both written and generously contributed by Thomas Lohmüller (@tht on GitHub). With timed, calling your code some time in the future, either once or periodically, is a piece of cake.

Here is the multi-tasking demo from the previous article, now using a timer instead of a task:

include ../flib/any/timed.fs

PA12 constant LED1
OMODE-PP LED1 io-mode! ;

timed-init

: blink ( -- ) LED1 iox! ;  \ toggle LED1

' blink 200 0 call-every    \ set up a periodic callback

That’s all there is to it. By default, we have eight timer “slots”, and in the above code, we’ve set up a periodic callback every 200 ms in slot zero (”' blink” means “the address of blink”).

For a slightly larger example with 4 LEDs blinking at different rates, see ex/timers.fs.

But here’s the interesting bit of how this has been implemented:

the timed package defines and starts a task (!) to run in the background
so we can still stop all timers by entering singletask, as with tasks
timing will only be accurate if all other tasks play nice and call pause regularly

In this demo there is not much difference evident, but when you have a lot of activities and timeouts, the timed module can help manage it all. It can be configured to handle any number of timers (and it’s much more lightweight than using task stacks: a timer uses only 16 bytes).

So what this design does, is simply to merge the two concepts: we’re still using the cooperative multi-tasker to create the illusion of parallelism, but now the simple one-shot and periodic callbacks are all contained within a single task, avoiding the per-task memory and switching overhead. The timed background task merely keeps track of what to call back, and when, and offers three simple words to manage these activities in whatever way you need:

call-after sets up a one-shot call in a specified slot
call-every sets up a periodic call in a specified slot
call-never cancels the callback associated with a specified slot

See the documentation page for examples and additional details.

With multi-tasking and timers, we now have some nice tools to deal with more complex tasks.

One of the examples in the multi.fs code contains this little gem:

: sleep ( -- ) [ $BF30 h, ] inline ; \ WFI Opcode, enters sleep mode

task: lowpower-task

: lowpower& ( -- )
  lowpower-task activate
    begin
      eint? if \ Only enter sleep mode if interrupts have been enabled
        dint up-alone? if ( ."  Sleep " ) sleep then eint
      then
      pause
    again ;

It’s a task which gets called periodically, like every task in the cooperative multi-tasker, and then checks if it’s the only enabled task, i.e. not set to idle. If so, it puts the ARM µC in “sleep” mode, a simple trick which halts the processor until the next interrupt. In other words, when there is nothing to do: pause until the next interrupt instead of frantically twiddling thumbs!

Sleep mode cuts power consumption roughly in half, so it’s not a huge win, but the interesting aspect is that it does not affect normal operation of the code: applications can take advantage of this without change. All they need to do is put tasks into idle mode when… idling!

For ultra low-power nodes, we’ll need to take this a lot further. We need to put the µC into “stop” (or even “standby”) mode, where all the main clocks are stopped. This has far more impact on the application: when clocks are stopped, the application loses all sense of time - not so great when you want to do a few things periodically.

Can we have an architecture whereby the application continues to think in terms of timers, periodic actions, and callbacks, yet also make the µC go into these really low-power modes whenever there is nothing to do?

This is where the timer task presented in the previous article could come into play. What if we were to extend it a bit as follows:

for short timers (one-shot and periodic), nothing changes
when the next timer is known to fire more than 10 ms into the future, we enter stop mode for that amount of time instead of just idling

The key benefit of managing all timers in a single task, is that it becomes the sole place in the application which needs to track the passage of time. That means it could figure out exactly when the next callback needs to be triggered.

We’ll probably need to take care of some other details to make this work well:

all tasks should be set to idle when there is no work for them
hardware µC interrupts will need to wake up the task that will handle them

Note that when the µC is in stop mode, some of its interrupt capabilities are in fact disabled. The UART for example, is likely to be comatose, so interrupts won’t even be generated. We could work around this by setting up a falling edge interrupt on the RX pin, to wake up on incoming data - even if that means losing the first character(s), as the µC springs back to life.

Here is an example, again from the multi.fs package, which illustrates how the command prompt task can easily be put to sleep:

0 variable seconds
task: timetask

: time& ( -- )
  timetask background
    begin
      key? if boot-task wake then
      1 seconds +!
      seconds @ . cr
      stop
    again ;

time& lowpower& tasks

: tick ( -- ) timetask wake ;

 ' tick irq-systick !
 16000000 $E000E014 ! \ How many ticks between interrupts ?
        7 $E000E010 ! \ Enable the systick interrupt.

stop \ Idle the boot task

It’s a fairly complex bit of code, but here’s the essence of what it does:

define a new task called timetask
the code for this task increments seconds, then stops itself, forever
the SysTick handler is set to run once a second, and wake up the task

Two lines in this code are very special in this context:

key? if boot-task wake then
stop \ Idle the boot task

Looking at that last one: entering stop on the command line puts the command-processor in Mecrisp Forth to sleep. By doing this, we’ve disabled the prompt and we’ve lost control!

However, the other task still running is timetask, and it’s being woken up every second by the SysTick interruot handler. Since timetask checks for new input using key?, it can bring the command prompt back when fresh input needs attention. Now we’re back in business!

This is by no means the only way to deal with the command prompt in low-power scenarios, but it illustrates that there are ways to have our cake and eat it too: an application which can enter low-power modes, yet retain the ability to listen to the serial port and jump back into interactve mode when needed.

So far, these considerations are preliminary and not exhaustively tested. But hey, you gotta start somewhere when trying to come up with a foundation for ultra low-power nodes, eh?

Those little plastic µSD-to-SD card adapters, of which you may have a bunch lying around since they are often included with new µSD cards, make excellent µSD card sockets:

There are many libraries (in C for the Arduino, for example) which support accessing SD cards from an embedded µC. These all rely on a well-known feature of these cards of supporting SPI:

The only tricky bit is getting them into that mode. For this, the SPI clock has to be temporarily lowered to 100..400 KHz, and then a few magic pin toggles and byte sends will do the rest.

There is a superb description of all the details at www.elm-chan.org.

And now, there is an sdcard.fs package in the Embello repository which implements this for Mecrisp Forth. Keep in mind that it’s young and still has a few weak spots:

it assumes SPI1 is used, and has hard-wired the necessary clock slow-down
it has only been tested on 2 GB µSD cards (4 GB and above will probably not work)

But apart from that, it works like a charm. Just call sd-init to initialise SPI and connect to the card. After that, you get three simple words to use it:

sd-size will return the size of the card (in 512-byte blocks)
sd-read takes a block number and reads that block into a buffer called sd.buf
sd-write also takes a block number and writes sd-buf to the SD card

Reading and writing each take 1 to 2 ms - and that’s all there is to it!

But one of the key attractions of SD cards, is that they can make a very easy data interchange possible with the “bigger” computers out there: no doubt also due to digital cameras (for SD) and mobile phones (for µSD), many modern laptops now include an SD card slot.

And this is where it gets complicated… how do you treat a bunch of blocks as a file-system?

Again, thanks to the wonders of digital photography, this has all been solved long ago: the FAT file system became famous in MS-DOS, but is in fact an evolution of the CP/M file system.

FAT was introduced some 40 years ago and then evolved to VFAT (supporting larger disks), LFN (removing the 8.3 filename restriction), and ending with exFAT (which feels more like an attempt to secure a new licensing model than anything else), 10 years ago.

As it turns out, if we keep things simple, even a µC can handle such a file system with ease.

The sdcard.fs package mentioned earlier has been extended to support reading and writing files under the following constraints:

the SD card must be formatted as FAT16 (the default for storage media up to 2 GB)
LFNs are not supported, but pose no problems: the special entries are ignored
subdirectories are ignored, only files in root can be accessed by the current code
only existing files up to 8 MB can be read and written
files can be read and written, but they cannot grow or shrink
in other words: we can easily overwrite existing data blocks, but not much more

This may all seem a bit limiting, but there are cases where that’s already sufficient to be of use. And of course anyone is free to take the code and extend it with more powerful capabilities.

Again, a few Forth words is all it takes to expose this functionality:

sd-mount takes and initialized SD card and analyses the file system for further use
sd-mount. (note the extra dot) does the same, but also prints out some details
ls - displays a list of all entries in the root directory
fat-find finds an entry by name, and returns its starting cluster
fat-chain stores the file access map in one of a small number of “open file slots”
fat-map takes a logical file block and slot, and returns a physical SD block number

The following example inits the SD card, mounts the FAT file system, looks up a file called “READ.ME”, opens the file chain as slot 3, and reads block #0 of 512-bytes into sd.buf:

: fatdemo ( -- )
  sd-init sd-mount
  s" READ    ME " drop fat-find 3 fat-chain
  0 3 fat-map sd-read ;

This really is a very rudimentary implementation at the moment:

you have to know how long the file is, the code does not keep track of the end
filenames have to be passed as fixed-size 11-byte strings that omit the dot
there are only 4 slots, i.e. at most 4 open files can be used at any poin in time
there’s almost no error checking: it’s best to treat this as proof-of-concept for now

But hey, it’s a start and it makes it possible to exchange data in terms of files on an SD card!

Before going into interrupts, why they’re needed, and why they are tricky, let’s first look into an example which does not use interrupts: writing a pass-through USB-to-serial application.

Note that many of the observations and issues that follow apply to any procedural language!

Here’s a simple loop to pass characters from the UART to USB, and from USB to the UART:

: run
  uart-init  19200 uart-baud
  begin
    uart-key? if uart-key emit      then
    key?      if key      uart-emit then
    \ do other things here...
  again ;

From this layout, you can see that it’s a symmetrical process: whatever comes in on one port, gets emitted out the other. Easy stuff, right? And indeed: the above code does work.

Sort of…

The problem is that there will be speed differences, with characters coming in a lot faster, potentially, than the serial port can handle. This will cause the uart-emit call to block, preventing the other side of the transfer from proceeeding while the UART is busy sending.

As a result, data coming into the UART will get dropped, since it won’t be read out in time before more characters come in. With a fancy term: the incoming UART feed does not support back-pressure (since neither hardware- nor software- handshaking have been implemented).

We can fix that with the help of the multi-tasker:

task: uart-task

: uart-reader&
  uart-task activate
  begin
    uart-key? if uart-key emit then
  again ;

: run
  uart-init  19200 uart-baud
  multitask uart-reader&
  begin
    key? if key uart-emit then
    \ do other things here...
  again ;

Now, a separate background task is started before the main loop, copying data from UART to USB. That task handles the reverse flow, and sure enough: we’ve solved the blocking issue!

Well, sort of…

This simple example will work just fine, because the processor is not doing anything else. But at say 115,200 baud, we have to read out every byte coming into the UART within some 10 µs.

That’s where interrupts come in: instead of polling the UART all the time, to check whether a byte has been received, we can configure it to generate an interrupt each time this happens. The code for this has already been written (for F103 and for L052). Being layered on top of the polled version, each of these interrupt-based variants is in fact under two dozen lines of code.

Note that we still can’t prevent the data from arriving at the UART receive port at full speed. The only benefit of interrupts here, is that we can immediately store it in a (ring) buffer, and collect more bytes until our application is willing and able to process them.

The code with interrupt handling and a 128-byte buffer is delightfully similar to the original:

task: uart-task

: uart-reader&
  uart-task activate
  begin
    uart-irq-key? if uart-irq-key emit then
  again ;

: run
  uart-irq-init  19200 uart-baud
  multitask uart-reader&
  begin
    key? if key uart-emit then
    \ do other things here...
  again ;

Now our application can take over 100 times as long between calls to pause as before, and we won’t lose any data. Interrupts will quickly take stash away each incoming byte, and that’s it!

Yeah, more or less…

But this code is still based on a couple of constantly-polling loops, consuming lots of idle CPU cycles - so yes, although it does work fine, it’s not such a great approach for low-power nodes.

There’s one more refinement we can easily add to this: instead of having the background task poll for new data in the buffer, we can wake it up when filling the input buffer. What we need to do is replace the interrupt handler with a slightly more advanced one.

The trick is to replace this one line of code in uart-irq-init:

['] uart-irq-handler irq-usart2 !

But instead of changing the library, let’s simply replace the handler with our own version:

[: uart-irq-handler uart-task wake ;] irq-usart2 !

It does everything the original interrupt handler does, but it also wakes up uart-task every time an interrupt triggers. Here is the final code, with uart-task sleeping most of the time:

task: uart-task

: uart-reader&
  uart-task background
  begin
    begin uart-irq-key? while uart-irq-key emit repeat
    stop
  again ;

: run
  uart-irq-init  19200 uart-baud
  [: uart-irq-handler uart-task wake ;] irq-usart2 !
  multitask uart-reader&
  begin
    key? if key uart-emit then
    \ do other things here...
  again ;

The key enabler here, is that wake (and idle, as used inside stop) are “interrupt-safe”: they can be used from inside interrupt handlers. That’s what makes the above architecture possible.

There is one detail which needs to be mentioned: note that every UART receive interrupt will wake up uart-task, but that it won’t run right away, since the multitasker is collaborative.

It’s up to the app to decide when and where to pass control to the multi-tasker (using pause). At times, several interrupts might be triggered before uart-task actually gets a chance to run. Because of that, we must process all pending data before going back to sleep by calling stop.

Is that all there is to it, then?

Yes, it really is. Interrupt handlers still need to be written with great care to avoid affecting variables which the application also uses (in this case the ring buffer), but the beauty of this approach is the clear-cut separation of responsibilities: interrupt handlers should only do what’s time critical (“get that byte out of the UART!“), everything else happens when & where the application is ready for it. And by using stop and wake, we can avoid the frantic polling.

If you consider µCs to be incapable of any “serious” data handling, then you’ll be in for a treat.

The following design was created for an upcoming project, which needs a fairly high-speed path for handling requests and transferring 512-byte blocks of data to and from an SD card.

One option is to act as a slave-side SPI device. Here’s how SPI works, courtesy of Wikipedia:

Note that the master drives the clock which causes both shift registers to exchange each bit.

SPI is stunningly simple and elegant when only two devices are involved. And just by adding an SPI “select” line, the master can signal to the slave when a transaction is complete. After 8 clock cycles one byte will have been transferred from the master to the slave, and one byte will have moved back in the other direction (in many cases, one of the two directions is ignored).

Normal µC hardware will trigger a request every 8 clock cycles, and the edges of that extra SPI “select” line can then be used to delimit the beginning and end of data packets, respectively.

By adding one more pin (let’s call it “BUSY”) from slave to master, the slave can also let the master know when it has processed an incoming request, and is ready to provide a reply.

So all in all, 5 I/O pins are sufficient to send and receive “packets” in both directions between a master and a slave. A very similar mechanism is used between a µC (as master) and an SD card (as slave), in fact - except that the busy signalling there takes place on the MISO pin.

The transfer clock can have an extremely high rate when the signal distance is low, say a few centimeters. You can easily clock at 8 MHz and transfer one byte per microsecond this way.

But there’s a catch: given that the master drives the clock, it’s very easy for the master to only do so when it’s ready to send and receive data. Which is why it’s so easy to implement an SPI master in software. On the master side, SPI transfers are automatically throttled by the µC.

On the slave side, we don’t have that luxury: bits will arrive at a rate we can’t control. In fact, there’s not an easy way for the master to see whether the bits are being received and sent correctly. All the master can do is write the MOSI pin and read the MISO pin on clock edges.

If the maximum speed is low enough, transfers can be handled by polling the SPI peripheral in software, or with an interrupt generated for each byte. But with a clock rate of 8 MHz or more, there won’t be enough time for the CPU to handle this. That’s where DMA comes in: transfers directly between the SPI hardware and a memory buffer, without wasting CPU cycles at all.

With DMA, we can easily handle a byte per microsecond, on a µC like the F103 running at 72 MHz. Since SPI is bi-directional, we will need to have two DMA “channels” enabled at the same time: one to take bytes from SPI and store them in a memory buffer, and one to feed bytes from a second buffer to SPI. This setup must be repeated before each transfer.

The STM32F103’s DMA hardware supports up to 7 transfers concurrently, but only from a fixed mix of peripheral channel allocations:

Let’s use SPI2, for which DMA channels 4 and 5 have to be set up and activated.

Here is the logic which needs to be implemented:

in idle state, SEL is high, BUSY is low, and the clock is not active
the master lowers SEL to start a request
then it sends some bytes, with the clock toggling, and MOSI shifting out the data
at the end of the request, the master raises SEL back to “1”
the slave treats this as a trigger, and raises BUSY to signal it got the request
that master waits for BUSY to drop, while the slave is … busy processing
when done, the slave lowers BUSY, triggering the master that it’s done
now the master lowers SEL, sends/receives a number of bytes, then raises SEL
this concludes the transaction

Note that the master is always in control of the transfers (in both directions), the BUSY signal is just used to keep the master waiting while the slave is handling the request.

Note also that the direction of the data in the second transfer depends on the request - it could be transferring data in either direction.

By convention, the first byte from the master will be the request code. In the second transfer, this code should be zero, since it’s not a new request but a concluding transfer of data.

Here is an demonstration of the whole process, as seen with a logic analyser:

On the slave side, the trick is to use the rising edge on the SEL signal as the trigger, using a pin interrupt, which occurs at the end of each transfer. The rest can be handled using DMA, with no involvement of the CPU at all (and hence at ridiculously high speed, if needed).

As in the previous article, we can use an interrupt to trigger on SEL (a pin change interrupt in this case), and then wake up a task created specifically to handle these requests. Without going into the details here, you should nevertheless be able to see that it’s the same trick as before:

[: BUSY ios! 12 bit EXTI-PR ! slavetask wake ;] irq-exti10 !

Every time SEL goes high we trigger this code, which sets BUSY high and wakes slavetask. At this point, multi-tasking takes over, and it really doesn’t matter how long this will take.

The slave task (details to follow in an upcoming article), then contains all the logic to tie SPI’s RX and TX sides to two DMA channels, writing and reading two different buffers - in parallel!

: slave&  \ this task will process all incoming SPI2 requests
  slavetask background
  begin
    stop-dma

    vreqbuf c@ case
      \ ... here is the dispatch code to handle each incoming request
    endcase

    reset-spi2
    restart-dma
    BUSY ioc!
    stop
  again ;

There is one very tricky aspect here (isn’t there always?): in slave mode, the SPI hardware TX side must be fed with the first byte to send out before the actual transfer starts. You can see why from the above master/slave diagram of this article: the moment a master clock pulse comes in, the slave hardware must start sending out the first reply bit - and there is no way for the slave to know in advance when, or at what rate, this will happen. On the slave side, we’re at the mercy of the master’s control of SEL and CLK. We have to always be ready for action.

Note the implicit logic behind all this: on SEL going high, BUSY is raised, the slave task is started, and when it is ready, BUSY is lowered again, with DMA set up for the reply.

This design works surprisingly well: it will support SPI clock rates up to 18 MHz (1/4th of the slave’s system clock!), and only generates two interrupts per request/reply transfer, at which point the CPU gets involved and the slave task is activated to perform some real work.

Apart from that, there is virtually no load on the slave side, it’s all handled by the DMA controller. The CPU is free to do whatever it wants. It could be doing interactive stuff over serial, compiling Forth code, performing SD card I/O… or even all of that at the same time.

Which… is what we’re about to do next. Stay tuned!

A new documentation site

http://embello.jeelabs.org

The JNZ rev4 PCBs are in!

Getting started with a JNZ

Installing more drivers in flash

Setting up a remote node

1. Preparing two JNZ’s

2. Setting up RX & TX nodes

3. Low-power transmissions

4. Unattended operation

5. Running off a coin cell

6. Updating the code

How all those JNZs are tested

Post-mortem of a bug

Connecting a rotary encoder

Cutting the rotary cord

Making an always-on device

Several years on a coin cell?

Faster uploads through SPI

Every possible connection

Setting up the Pi software

Let's try out the multi-tasker

Sometimes, timers are easier

Stay busy, but also sleep a lot

SD cards with FAT files

Interrupts, tamed at last

Tying SPI and DMA together