Lattice ECP5 FPGAs and open-source tool chain

Pieter-Tjerk de Boer, PA3FWM web@pa3fwm.nl

This page documents my (anecdotal) experience in using the Lattice ECP5 FPGAs with the Yosys/NextPNR tool chain, in the hope this is of use to others.

Hardware

The picture shows my home-built hardware. Since I needed the SerDes functionality, which wasn't available in any cheap development board, and not in any non-BGA packaged FPGA, I ended up mounting a BGA chip upside down and wiring it up with 0.1 mm enameled copper wire. It's the square in the center of the picture. The smaller square at the right is an A/D converter, and the metal thing at top-left is an SFP containing a fiber-optic gigabit ethernet interface.

The pinout ("ball-out"?) of the FPGA is not in the datasheet, but is supplied separately on Lattice's website as a .csv file. To aid in wiring, I used a short awk script to make this into a page-size diagram showing the (main) function of each ball, which you can get here; this is for the ECP5UM-45 chip in BGA381 package.

Programming the FPGA is done via the JTAG interface, brought out on one of the pin headers. I've also installed a ocnfiguration memory, in my case a Winbond W25Q80DVSSIG, wired up according to the sysCONFIG document. The configuration memory can be programmed via the same JTAG interface through the FPGA. Apparently, the programming interface is standardized among such memories to such an extent that this "just works" -- or I got lucky.

Getting started

For getting started with the tools, I used the Colorlight i5 board with MuseLab's development board available via AliExpress, and Tom Verbeure's detailed step-by-step instructions.

After unplugging the Colorlight module, the MuseLab development board can be used as a JTAG programming interface for ECP5s sitting on other boards, by soldering wires to the MuseLab board's JTAG pins.

The MuseLab board also contains a serial-over-USB interface for debugging, which however has one quirk: hexadecimal bytes 10 and 12 are ignored. (I.e., if your FPGA project sends one of these byte to the serial port, they never arrive at the host computer.) This is no problem if the debugging data is ASCII text, but problematic if it's binary.

PLL

The PLL is documented in FPGA-TN-02200, but that document assumes you use the "official" tools. Using Yosys/Nextpnr, one works at a slightly lower level of abstraction; e.g., one has to set the various dividers manually rather than having them computed automatically from the desired input and output frequencies. Here's an example of using a PLL block to generate a 125 MHz clock clkout from a 72 MHz input clock clk72MHz:

    EHXPLLL mypll (
       .CLKI(clk72MHz),
       .CLKFB(clkout),
       .CLKOP(clkout),
       .ENCLKOP(1),
       .RST(0),
    );
    defparam mypll.CLKI_DIV = "72";
    defparam mypll.CLKFB_DIV = "125";
    defparam mypll.CLKOP_DIV = "2";
    defparam mypll.CLKOP_ENABLE = "ENABLED";

Note that CLKOP_DIV does not directly affect the output frequency, but it does affect the PLL's VCO frequency, which in this example will run at 250 MHz. On my chip, the VCO could go down to about 28 MHz, and up to beyond 300 MHz, but at those high frequencies the CLKOP_DIV shouldn't be too high for reliable operation. The official tools of course know about such restrictions; users of the open source tools have to try experimentally what works.

The complete list of parameters with their defaults is:

        parameter CLKI_DIV = 1;
        parameter CLKFB_DIV = 1;
        parameter CLKOP_DIV = 8;
        parameter CLKOS_DIV = 8;
        parameter CLKOS2_DIV = 8;
        parameter CLKOS3_DIV = 8;
        parameter CLKOP_ENABLE = "ENABLED";
        parameter CLKOS_ENABLE = "DISABLED";
        parameter CLKOS2_ENABLE = "DISABLED";
        parameter CLKOS3_ENABLE = "DISABLED";
        parameter CLKOP_CPHASE = 0;
        parameter CLKOS_CPHASE = 0;
        parameter CLKOS2_CPHASE = 0;
        parameter CLKOS3_CPHASE = 0;
        parameter CLKOP_FPHASE = 0;
        parameter CLKOS_FPHASE = 0;
        parameter CLKOS2_FPHASE = 0;
        parameter CLKOS3_FPHASE = 0;
        parameter FEEDBK_PATH = "CLKOP";
        parameter CLKOP_TRIM_POL = "RISING";
        parameter CLKOP_TRIM_DELAY = 0;
        parameter CLKOS_TRIM_POL = "RISING";
        parameter CLKOS_TRIM_DELAY = 0;
        parameter OUTDIVIDER_MUXA = "DIVA";
        parameter OUTDIVIDER_MUXB = "DIVB";
        parameter OUTDIVIDER_MUXC = "DIVC";
        parameter OUTDIVIDER_MUXD = "DIVD";
        parameter PLL_LOCK_MODE = 0;
        parameter PLL_LOCK_DELAY = 200;
        parameter STDBY_ENABLE = "DISABLED";
        parameter REFIN_RESET = "DISABLED";
        parameter SYNC_ENABLE = "DISABLED";
        parameter INT_LOCK_STICKY = "ENABLED";
        parameter DPHASE_SOURCE = "DISABLED";
        parameter PLLRST_ENA = "DISABLED";
        parameter INTFB_WAKE = "DISABLED";

Where does this list come from? If you compile any project, a large .json file is generated which contains the list (search in it for EHXPLL), and also a reference to where on your system the master file is found.

Tri-state outputs

A tri-state I/O pin can be described in the standard Verilog way:

  module top (
    ...
    inout wire tristatewire,    // note it says inOut, not inPut
    ...
    );

  ...
  wire tristatewire;
  assign tristatewire = outputenable ? outputvalue : 1'bz;
  ...

Why am I telling you this? Because Yosys comes with the ominous warning "Warning: Yosys has only limited support for tri-state logic at the moment." However, the result works fine.

Pacoblaze microcontroller

An ECP5 FPGA is big enough to host a LiteX RISC-V processor running Linux, see here and here. However, often one doesn't want and need the complexity of this, and a simple 8-bit microcontroller would suffice. From earlier projects using Xylinx Spartan 3 FPGAs, I knew the (Xylinx-specific) Picoblaze microcontroller, and it turns out an open-source clone of this exists called Pacoblaze. It compiles fine for the ECP5 with Yosys after applying a trivial patch ............. to do ...............

SerDes

The SerDes (Serializer/Deserializer) was the most complicated thing to get to work. There is a substantial gap between the Lattice documentation and the interface we have to work with when using the open-source tools. This gap is of course normally filled by functionality of the official tools, which convert high-level settings (such as which protocol to use) into much more detailed settings of the actual chip. Also, I have the impression that part of the functionality promised by the documentation is actually implemented "on the fly" in the freely programmable part of the FPGA when needed, and thus not available to us.

Here's a bunch of observations and remarks:

In much of the documentation and files, the SerDes are referred to as DCU, which stands for Dual Channel Unit. That's because two SerDes share part of the (timing) logic.
Many signals of the SerDes that aren't documented directly, can be deciphered by looking at the register list in Appendix A of the Lattice documenation: often these signals have a similarly-named setting in the register file.
There's no need to configure the SerDes's physical input/output wires in the Verilog file (as external I/O of the module) or in the .lpf file (in which you specify which pins/balls to use for normal I/O). Instead, the Verilog code must contain a directive of the form
```
  (* BEL="X69/Y71/DCU" *)
```
to tell which of the DCUs to use (if your FPGA has multiple). In the ECP5UM-45, the two DCUs are at X42/Y71 and X69/Y71 (unfortunately, as I'm typing this a few weeks later, I can't find anymore where I found this wisdom).
The transmit clock generation works as follows. You supply some clock, that gets multiplied by either 8, 10, 16, 20 or 25, and then divided by either 1, 2 or 11. There are limits on what the result of the multiplication can be (i.e., the frequency at which the internal oscillator runs); these are not documented, so you may need to try for yourself. A typical example for Gigabit Ethernet over fibre, which has a bitrate (after 8b10b encoding) of 1.25 GHz, would be supplying a 125 MHz clock, multiply by 20, divide by 2. The divide by 11 option is handy for testing, as it brings the bitrate into the hundreds of MHz which can still be handled by a not-too-exotic oscilloscope.
```
    defparam DCU0_inst.D_REFCK_MODE = "0b000";   // 000..100 = internal high speed bit clock is at 20x, 10x, 16x, 8x, 25x offered clock
    defparam DCU0_inst.CH1_TX_DIV11_SEL = "0b0";    // 1 select divide by 11 transmit clock
    defparam DCU0_inst.CH1_RATE_MODE_TX = "0b1";    // 1 select divide by 2 transmit clock
```
The receive clock works similarly. It shares the multiplication factor with the transmit branch, but has independent setting of the division factor. Of course, in the end the SerDes will lock its receiver clock to the incoming signal, but apparently it uses the reference clock to check that it's quite near the correct frequency. A low-quality clock from the built-in oscillator (the OSCG element), is not good enough.
The receiver's PLL that tries to lock onto the incoming signal needs to be tuned appropriately. There's a bunch of settings that relate to this, but no documentation. I just tried several values until it worked...
For some reason, I couldn't get gigabit ethernet to work with the transmit clock derived from a crystal oscillator, so I cheated by using the recovered receive clock also for transmission. Of course, this only works fine if only one side does it...

To get started, here's a ready-made example, running a SerDes in gigabit ethernet mode, echoing all incoming packets with a small modification: ecp5-gbe-demo.tgz.

Useful links

Documentation of the .lpf file format: here.