# The 1:4 Phased Demultiplexer Circuit

Dr. Serafim Poriazis

### PHASETRONIC LABORATORIES

### **Abstract**

The behavior of the 1:4 Phased Demultiplexer (PDMUX4) circuit is analyzed. The circuit demultiplexes the input clock signal into four phased output signals by streaming sets of eight clock phases. A phase difference equal to the half period of the clock is maintained between consecutive output transitions. The VHDL description of the PDMUX4 cell is given and the simulation and synthesis results are generated. A 2-level tree-like structure is built by applying the phased outputs of the PDMUX4 cell into the corresponding clock inputs of four cell replicas that extend the circuit behavior. The EXOR4 gate is attached to the PDMUX4 cell output ports and is aggregating all the phases that the phased clock signals are carrying while preserving their phase associations.

## 1 Introduction

Demultiplexer (DMUX) ICs chips that target SONET OC-768 applications are able to operate error free beyond 50Gb/s [1]. The given 1:4 DMUX for Serial Communication Systems uses a tree architecture with a recursive series of 1:2 demultiplexer stages. It uses a half-rate clock for the first 1:2 demultiplexing stage.

The 1:4 DMUX with the Multi-phase Clock (MPC) architecture [2] consists of four parallel latch lines and the TFF generating four-phase clock. The IC was fabricated with InP HEMTs and confirmed to operate at up to 50 Gbit/s with 1.42-W power consumption. This circuit demultiplexes a 4f-(bit/s) input to four parallel f-(bit/s) outputs.

A new phase detector, which can perform a 1:2 data demultiplexing function, is given by [3]. The circuit incorporates flip-flops that are triggered by the falling and rising edges of the clock. They have developed a pulse compensation technique by which they detect whether or not a transition exists during the period by using XOR2 and obtain correct up/down pulses from the half-frequency clock.

The correct behavior of synchronous circuits depends upon the distribution of clock signals to different parts of the circuit. As being examined by [4], state machines can be implemented as synchronous circuits whose bistables and register are each clocked by one of a set of periodic signals. In particular the periodic signals would be phased.

\_\_\_\_\_

The phase associations between the input clock signal and the four phased output signals produced by the 1:4 Phased Demultiplexer (PDMUX4) circuit are examined in this paper. The subject circuit represents a reliable solution to the challenging problem of synchronizing the individual modules of a multiphase model [5], whose operation adopts a 4-phase timing pattern. The present work is based on the principles of operation of the Two-Phase Twisted Ring Counter (2P-TRC) circuit [6] and is targeting data streaming applications. The unit phase duration that is used throughout the text is equal to the half period of the input clock signal. Tree-like structures based on the demultiplexer circuit are built and their behavior is analyzed. The EXOR operator is associated to the aggregation of the phased outputs produced by the PDMUX4. A transposition mechanism assists us to build a primitive and an expanded PDMUX4/EXOR4 configuration, which fully preserve the phase relationships between the clock and the phased signals at the structural level of circuit design.

# 2 The PDMUX4 Cell

# A. The basic cell operation

The fundamental cell, which demultiplexes the clock signal *CLK* of frequency *f* into four phased output signals of frequency f/4, is called 1-to-4 Phased Demultiplexer (PDMUX4) circuit and is shown in Figure 1. The four overlapping output signals  $CLK_1$ =aDOUT1, CLK<sub>2</sub>=aDOUT2, CLK<sub>3</sub>=aDOUT3 and CLK<sub>4</sub>=aDOUT4, have frequency equal to f/4 (period of each  $CLK_i$ , i=1,2,3,4 equals 4.T, where T the period of CLK) with signal  $CLK_1$  leading  $CLK_2$ ,  $CLK_2$  leading  $CLK_3$  and CLK<sub>3</sub> leading CLK<sub>4</sub> by a T/2 phase difference. The logic-'1' or logic-'0' pulse width of each of the above phased signals is equal to 4·T/2. Consecutive changes of logic value at each signal  $CLK_i$ , for i=1,2,3,4 occur at the rising or falling edges of CLK at a distance of  $4 \cdot T/2$ .

# B. The valid codeword sequence

When the clock signal CLK is applied to the circuit, the following cyclic sequence of codewords is presented at the outputs:  $CLK_1$ ,  $CLK_2$ ,  $CLK_3$ ,  $CLK_4$ =0000 $\rightarrow$ 1000 $\rightarrow$ 1100 $\rightarrow$ 111  $0\rightarrow$ 1111 $\rightarrow$ 0111 $\rightarrow$ 0011 $\rightarrow$ 0001, which is considered as being the normal circuit operation. Each codeword remains stable for the state time of the circuit, that is T/2, and the above sequence is repeated throughout the operation of the cell. Thus the cycle time for the output pattern is defined by the eight-tuple of codewords of length equal to  $4\cdot$ T. This duration forms the period of each phased output signal.



Figure 1. The PDMUX4 basic cell block diagram

The additional 8 codewords out of the total 16 possible codewords that are not included in the above cyclic sequence should be considered during the design of the circuit for achieving reliable operation of the PDMUX4 cell. If the circuit reaches any of these 8 invalid codewords, then an invalid codeword flag is set at the output of the cell. This flag always forces the reset input ports of cascaded PDMUX4 cells to operate in the reset state, which maintains the proper initializing behavior until a valid codeword appears on the output port of the cell.

# C. The algorithm aspects of cell operation

The PDMUX4 cell operation is implemented by the VHDL description shown in Figure 2. The VHDL entity section has an input port CLK on which the clock signal of frequency f is applied and an input port RESET on which a reset flag is applied. The phased output signals of frequency f/4 of the cell are assigned to port PCLK[4..1] of width four, PCLK1=aDOUT1, PCLK2=aDOUT2, PCLK3=aDOUT3 and PCLK4=aDOUT4. The output port RSTFLAG is signaling the invalid codeword status of PCLK, or the cell reset state, which suspends the streaming of phases from CLK towards the outputs of the cell. The VHDL architecture section is of type "behavioral" and utilizes a state machine model, where two internal registers are being used, reg1 and reg2, one for the present state named "present\_state1" clocked by the rising edge of the clock and the other for the present state named "present\_state2" clocked by the falling edge of the clock, respectively. The next state logic block and the output logic block of the model are specified by the corresponding processes "next\_state\_logic" and "output\_logic". The set of eight valid codewords of the circuit are stored in an indexed array of size 8\*4=32 bits, that is represented by the constant named "phased output". The index of the above array cycles through the integer values 1 to 8 specifying the valid codeword entry for the next state signal. Whenever the RESET input is set, that is RESET='1', each output signal from PCLK[4..1] can cycle only through the values "0000" and "1111" thus assuring proper initialization of the circuit at either the rising or the falling edge of CLK.

# D. The VHDL simulation and synthesis

The VHDL testbench simulation results for the PDMUX4 cell are given in Figure 3. The duration of this simulation is defined by the value of the signal "done". Each of the signals "next\_state", "present\_state1", "present\_state2" and PCLK have each a width of 4 bits. Each value shown on

these signal waveforms is hexadecimal. The output port PCLK is analyzed into four individual output signals with waveforms that verify the correct operation of the circuit. The "index" signal has decimal values and defines the index value of the array of valid codewords. The logic value changes of PCLK occur at each rising and at each falling edge of the input signal CLK.

The synthesis of the PDMUX4 cell targeting an FPGA device was successfully performed giving us the following results:

- flip flops with asynchronous reset = 4
- flip flops with asynchronous preset = 4
- combinational feedback paths = 37
- combinational logic area estimate = 103 LUTs

```
library IEEE;
2 use is
    entity PDMUX4 is
       port (CLK , RESET : in std_logic;
            RSTFLAG : out std_logic;
            PCLK: out std_logic_vector(4 downto 1));
    end PDMUX4:
10 architecture behavioral of PDMUX4 is
        type validcodewords is array(1 to 8) of std_logic_vector(4 downto 1); constant phased_output : validcodewords :=
         (\ (\ 0',\ 0',\ 0',\ 0'),\ (\ 0',\ 0',\ 0',\ 1'),\ (\ 0',\ 0',\ 1',\ 1'),\ (\ 0',\ 1',\ 1',\ 1',\ 1'),\ (\ 1',\ 1',\ 1',\ 1',\ 1',\ 1',\ 0'),\ (\ 1',\ 1',\ 0',\ 0'),\ (\ 1',\ 0',\ 0'),\ (\ 1',\ 0',\ 0',\ 0')); 
13
15
           signal index : integer := 0:
16
           signal present_state1, present_state2, next_state: std_logic_vector(4 downto 1);
17
           signal invalidcode_flag : std_logic := '0';
18
19
         reg1 : process (CLK, RESET)
20
21
              if RESET = '1' then present_state1 <= phased_output(1);
22
              elsif (CLK='1' and CLK'eveni ) then present_state1 <= next_state ;
23
              end if:
24
25
         reg2: process (CLK, RESET)
26
27
                 RESET = '1' then present_state2 <= phased_output(5);
28
29
              elsif (CLK='0' and CLK'eveni ) then present_state2 <= next_state ;
30
         end process
31
32
          next_state_logic : process (CLK, RESET)
              case CLK is
33
34
35
36
37
               when '1' => if RESET = '1' then index <= 1; else index <= index + 1; end if;
                when '0' => if RESET = '1' then index <= 5; else index <= index + 1; end if;
                when others => null;
               end case:
38
               if index < 8 then next_state <= phased_output(index + 1);
39
               else next_state <= phased_output(1); index <= 1; end if;
                if next_state = phased_output(i) then invalidcode_flag <= '0'; exit; else invalidcode_flag <= '1'; end if;
41
42
43
44
45
46
47
              end loop:
          output logic: process (index, present state1, present state2)
              case CLK is
               case CLK is
when '1' => PCLK <= present_state1;
if RESET = '1' then RSTFLAG <= '1'; else
RSTFLAG <= invalidcode_flag; end if;
48
49
50
51
                               PCLK <= present_state2;
52
                                if RESET = '1' then RSTFLAG <= '1': else
                                RSTFLAG <= invalidcode_flag; end if;
               when others
              end case;
          end process;
56
```

Figure 2. The VHDL description of the PDMUX4 basic cell

# 3 The Phase Associations of the PDMUX4 Signals

We examine the associations between the input clock signal CLK and the phased output signals  $PCLK1=CLK_1$ ,  $PCLK2=CLK_2$ ,  $PCLK3=CLK_3$ ,  $PCLK4=CLK_4$  of the PDMUX4 cell. It is evident that the EXOR function can be utilized to express the following relationships:

$$CLK = CLK_{1} \oplus CLK_{2} \oplus CLK_{3} \oplus CLK_{4}$$

$$CLK_{1} = CLK \oplus CLK_{2} \oplus CLK_{3} \oplus CLK_{4}$$

$$CLK_{2} = CLK \oplus CLK_{3} \oplus CLK_{4} \oplus CLK_{1}$$

$$CLK_{3} = CLK \oplus CLK_{4} \oplus CLK_{1} \oplus CLK_{2}$$

$$CLK_{4} = CLK \oplus CLK_{1} \oplus CLK_{2} \oplus CLK_{3}$$

$$(2)$$

The equation (2) satisfies an extended transposition mechanism for the EXOR operator that states the following: "if  $f=g\oplus h$ , then  $g=f\oplus h$  and  $h=g\oplus f$ " where f,g and h are binary variables. Thus an EXOR4 gate can be attached to the outputs of the PDMUX4 cell in order to aggregate the phased signals  $CLK_1$ ,  $CLK_2$ ,  $CLK_3$ ,  $CLK_4$  and produce a replica of the input clock signal, that is  $CLK_2 \cup CUT = CLK$ .

A primitive PDMUX4/EXOR4 configuration is thus formed by the two modules, the PDMUX4 cell and the EXOR4 gate, which are being interconnected via the phased signals CLK<sub>1</sub>, CLK<sub>2</sub> CLK<sub>3</sub> CLK<sub>4</sub>. All referenced signals of this configuration maintain the transposition mechanism stated above. We notice that the outputs  $CLK_1$ ,  $CLK_2$ ,  $CLK_3$ . CLK<sub>4</sub> of the PDMUX4 are themselves periodic signals that can be used to drive in succession replicas of the cell, thus forming a tree-like structure. A new phased signal pattern of period 16·T is thus produced with output signals  $w_1$ ,  $w_2$ ,  $w_3$ , ...,  $w_{16}$  (b1DOUT1, b2DOUT1 ,..., b4DOUT4) with  $w_i$ leading  $w_{i+1}$  (i=1,2, ..., 15) having a phase difference equal to T/2. This structure (PDMUX4\*4) is composed of two levels of cells as shown in Figure 4(a). According to the frequency of the driving signals, the first level is driven by CLK of frequency f and the second level is driven by  $y_1$ =aDOUT1,  $y_2$  =aDOUT2,  $y_3$ =aDOUT3,  $y_4$ =aDOUT4 of frequency f/4. By using an inverse-tree-like structure of EXOR4 gates we can apply the EXOR function to the pattern of the phased signals  $w_1, w_2, w_3, \dots, w_{16}$  and form a 2-level expanded PDMUX4/EXOR4 configuration. Its aggregated output signal CLK\_OUT is a replica of the input clock signal *CLK*, thus giving  $w_1 \oplus w_2 \oplus w_3 \oplus \dots \oplus w_{16} =$ CLK\_OUT=CLK as shown by the associated simulation results in Figure 4(b).

The synthesis of the 2-level expanded PDMUX4/EXOR4 circuit configuration targeting an FPGA device was successfully performed giving us the following results:

- flip flops with asynchronous reset = 20
- flip flops with asynchronous preset = 20
- combinational feedback paths = 185
- combinational logic area estimate = 505 LUTs

### 4 Conclusion

The 1:4 Phased Demultiplexer (PDMUX4) circuit is the basic cell being considered in this paper. Its operation is analyzed and the eight valid codewords that appear at the phased output signals cycle in a predetermined sequence of length 4·T, where T the period of the input clock signal *CLK*. The VHDL description of the PDMUX4 cell is given. The corresponding simulation results verify the proper circuit behavior with the phased output signals PCLK[4..1] maintaining the specified phase associations with *CLK*. The synthesis results are given targeting an FPGA device.

A 2-level expanded PDMUX4/EXOR4 configuration is formed by tree-like structures of the basic cell and the EXOR gate. The input *CLK* is demultiplexed into 16 phased signals which are then aggregated into an output *CLK\_OUT* by preserving all the embedded phase associations of the above signals. The simulation results verify the circuit behavior of the 2-level expanded configuration, while the corresponding synthesis results show the complexity requirements for circuit implementation.

### References

- [1] M. Meghelli, A.V. Rylyakov, and Lei Shan, "50 Gb/s SiGe BiCMOS 4:1 multiplexer and 1:4 demultiplexer for serial communication systems," in Proceedings of the Solid-State Circuits Conference, 2002, Digest of Technical Papers, ISSCC, 2002 IEEE International, San Francisco, CA, USA, pp 460-465, vol. 1, 3-7 February 2002.
- [2] K. Sano, K. Murata, H. Kitabayashi, S. Sugitani, H. Sugahara, and T. Enoki, "1.4-W 50-Gbit/s InP HEMT 1:4 Demultiplexer IC with a Multi-phase Clock Architecture," in Proceedings of the Microwave Symposium Digest, 2003 IEEE MTT-S International, pp 1181-1184, Vol. 2, 8-13 June 2003.
- [3] K. Nakamura, M. Fukaishi, H. Abiko, A. Matsumoto, and M. Yotsuyanagi, "A 6 Gbps CMOS Phase Detecting DEMUX Module Using Half-Frequency Clock," in Proceedings of the VLSI Circuits, 1998, Digest of Technical Papers, 1998 Symposium on, Honolulu, HI, USA, pp 196-197, 11-13 June 1998,
- [4] S. Poriazis, Theory and Design of Multiphase Synchronous Circuits, PhD thesis, Cranfield University, College of Aeronautics, Electronic System Design, England, 1994.
- [5] S. Poriazis, Logic Design of Multiphase Finite State Machines, Monograph, Phasetronic Laboratories, http://www.phasetroniclab.com Athens, Greece, 2001.
- [6] S. Poriazis, "The Two-Phase Twisted-Ring Counter Circuit," in Proceedings of the 2002 IEEE International Symposium on Circuits and Systems (ISCAS'02), Phoenix, Arizona, USA, vol. IV, pp. 858-861, 26-29 May, 2002.



Figure 3. The PDMUX4 basic cell operation (VHDL simulation results)





Figure 4. The 2-level expanded PDMUX4/EXOR4 circuit configuration

- (a) block diagram
- (b) VHDL simulation results