Ah, So That’s How Probabilistic Constellation Shaping Works!

# Ah, So That’s How Probabilistic Constellation Shaping Works! July 8, 2020
By Paul Momtahan
Director, Solutions Marketing

Conventional coherent transmission is based on quadrature amplitude modulation (QAM), which uses a combination of phase and amplitude to encode bits of data. For example, Figure 1 below shows a 16QAM constellation diagram with 16 constellation points encoding 4 bits each. Figure 1: 16QAM example: phase and amplitude

Each constellation point is a unique combination of phase and amplitude, with phase represented by the angle and amplitude by the distance from the center of the diagram. With traditional modulation, each constellation point has the same probability of being used. This means the outer constellation points, with higher amplitude and therefore requiring more energy/power, have the same probability of being used as inner constellation points with lower energy/power, as shown in Figure 2. Figure 2: Conventional QAM: equal probabilities

Probabilistic Constellation Shaping (PCS) uses the lower-energy/-power inner constellation points more frequently and the higher-energy/- power outer constellation points less frequently, as shown in Figure 3, enabling it  to deliver benefits like enhanced granularity, improved tolerance to noise and/or nonlinearities, and baud rate flexibility, as described in my earlier blog Probabilistic Constellation Shaping: Faster, Further, Smoother. Figure 3: Probabilistic constellation shaping

But how does it work? The “secret sauce” of PCS is the distribution matcher, shown in Figure 4. While sophisticated in detail, at a high level what it does is take a uniform bit sequence, ones and zeros with equal probability, and convert these into symbols with a desired distribution, typically Gaussian. At the other end, a reverse distribution matcher converts these symbols back to the original bitstream. Figure 4: The distribution matcher

To show how this might work with a simplified example, if we take four bits (0 to 15) and convert these to two symbols (from among eight possible symbols, letters A to H), as shown on the left of Figure 5, then regardless of the incoming bitstream, we are likely to see a lot more Ds and Es and a lot fewer As or Hs, giving us the distribution shown on the right.

Note that we are not mapping bits to individual symbols but a string of bits to a combination of symbols, with the mappings in this example created manually to give the desired distribution. In this example we are using 6 bits (two 3-bit symbols) to transmit 4 bits worth of data, so our data rate is now two-thirds of what we would get if we used the full modulation with all the symbols used equally. Figure 5: Example bits-to-symbol mapping and symbol distribution

A real distribution matcher is more complicated than the simplified example described in the previous section. The distribution matcher needs to find a bits-to-symbol mapping that meets the desired data rate (effective bits per symbol) with the desired probability distribution, and it needs to do this in real time for a constantly changing stream of bits at the speeds required for 800 Gb/s wavelengths, including the overhead (i.e., 960 Gb/s with a 20% overhead or 1,000 Gb/s with a 25% overhead).

How well it does this depends, to a large extent, on how much data it looks at, which we will refer to as the codeword length, shown in Figure 6 on the previous page. A long codeword increases the probability that a good match can be found. The game of chess provides a good analogy: the larger the number of moves you can think ahead, the better your chances of winning. Figure 6: Codeword length vs. PCS gain

As shown in Figure 6, a codeword length of around 100 symbols results in around half the gain of a codeword with a length of around 1,000 symbols, with diminishing returns as we go much beyond that. A codeword with a little more than 1,000 symbols is therefore enough to deliver almost all the potential gain of PCS. However, in addition to advanced algorithms, long-codeword PCS (LC-PCS) requires an ASIC/DSP with a 7-nm or better process node.

A common concern that comes up with PCS is latency. Does PCS add latency? Does a longer codeword mean more latency? The answer is that PCS, even with a long codeword, adds negligible latency, less than 10 nanoseconds, at least for the long-codeword PCS implementation in sixth-generation Infinite Capacity Engine, which does not require buffering.

Another concern can be power consumption. PCS requires additional processing in the digital ASIC/DSP, which has a cost in terms of power consumption. This can increase the “headline” power consumption, or the power consumption in watts per Gb/s for the maximum data rate at the shortest distance. It may also limit the adoption of PCS in ASICs/DSPs for compact coherent pluggables, where low power consumption is a top priority.

However, the increased capacity-reach enabled by PCS can give it an advantage when distance is included in power consumption comparisons (i.e., watts per Gb/s per km). Furthermore, the power consumption of PCS is reduced with power-efficient 7-nm process node ASIC/DSP technology, as used in Infinera’s ICE6 optical engine.

For more information on this important topic, see new the Infinera white paper “Faster, Further, Smoother: The Case for Probabilistic Constellation Shaping.” 