# An Adaptive Fuzzy Logic-based Routing Algorithm for Networks-on-Chip

Masoud Dehyadegari<sup>1</sup>, Masoud Daneshtalab<sup>1,2</sup>, Masoumeh Ebrahimi<sup>2</sup>, Juha Plosila<sup>2</sup>, Siamak Mohammadi<sup>1</sup> <sup>1</sup>School of Electrical and Computer Engineering, University of Tehran, Tehran, Iran <sup>2</sup>Department of Information Technology, University of Turku, Turku, Finland {masdan, masebr, juplos}@utu.fi {mdehyadegari, smohamadi}@ut.ac.ir

## Abstract

In this paper, we propose an adaptive routing algorithm based on fuzzy logic in which each link cost is dynamically determined based on the current network condition. Using this algorithm, the traffic is distributed through the nodes that are less congested or have a spare capacity. The technique using a fuzzy controller takes advantage of two factors that are the number of empty spaces in the buffer of each neighbor and the waiting time for the previous packet. The output of the fuzzy controller is the link cost so that in each router, the link with the lowest cost is chosen as the optimal route. To evaluate the proposed routing method, we have used two multimedia applications and a random traffic profile. The experimental results show that the proposed routing scheme improves the performance up to 30% with a negligible hardware overhead.

Keywords: Networks-on-Chip, Fuzzy Logic, Routing algorithms.

## 1. Introduction

Nowadays, SoC designers employ traditional buses or hierarchical bus structures to interconnect several processing elements (PEs). Buses cannot transfer more than one data stream simultaneously so that they are a bottleneck in future many core architectures. In addition, the major challenge that SoC researchers face today is to come up with structured, scalable, reusable and high performance interconnection architectures/platforms for integrating a large number of heterogeneous cores on a single chip. To meet these challenges, many research groups have concurrently proposed the idea of using a packet switched communication network, similar to one used in computer networks, for on chip communication in the realm of multicore architectures. This communication platform is named Network-on-Chip (NoC) [1][2][3][4][5][6].

NoC is a packet switching communication providing scalability and reusability of communication infrastructure. These features reduce time to design and time to market for new products. The performance and efficiency of NoC highly depend on the underlying communication infrastructure; this, in turn, relies on the performance of the on-chip routers. Thus, the design of high performance onchip routers represents a critical issue for the success of the NoC approach. The router is the main component of the interconnection architecture for transferring information from a source to a destination. The router has a functionality of transferring information from one of its input ports to any of its output ports. A network consists of an interconnection of many routers to enable a large number of cores to communicate with each other.

A typical mesh-based NoC architecture, shown in Figure 1, has recently proposed as a solution for the complex on-chip communication problems [1][6][7][8]. This architecture consists of a grid of typical routers each connected to a PE (each PE can be a general-purpose processor, a DSP, a memory subsystem, etc.).

The result of an efficient routing scheme is to increase throughput for the same value of average delay per packet under high offered load conditions and to decrease average delay per packet under low and moderate load condition.

The routing algorithm defines the path taken by a packet between the source and the destination. The two main components of routing algorithms are path determination and information transportation from a source to a destination. The information transportation refers to the packet switching communication. Although it is relatively straightforward, the path determination can be very complex. Routing algorithms use some metrics to evaluate which path will be the best for a packet to travel. Metrics such as path length (hop count), reliability, delay, bandwidth, and load are used to determine the optimal route. In our proposed routing scheme, the number of empty space in neighbor's buffer and the waiting time for previous packet are exploited as routing metrics.

Fuzzy Controllers are classic knowledge-based controllers using artificial techniques with origins in fuzzy logic. The fuzzy control system accounts for ambiguities in a data by giving a level of confidence rather than declaring the data simply true or false. The fuzzy logic uses linguistic descriptions to define the relationship between the input information and the output action.



Figure 1. Tile-based 2D-Mesh topology.

In this paper, we propose an adaptive routing algorithm, named Fuzzy-based Routing Algorithm (FRA), utilizing the fuzzy logic controller. The controller takes advantage of two inputs, the number of empty spaces in the buffer of each neighbor and the waiting time for the previous packet. The output of the fuzzy controller is the link cost so that, in each router, the link with the lowest cost is chosen as the optimal route. Using this routing scheme, the traffic is distributed through the nodes which are less congested or have a spare capacity. Hence, the average latency can be reduced significantly.

The rest of the paper is organized as follows. In Section 2, we give a brief review of the related work. The description of fuzzy controller is presented in section 3 while the implemented NoC environment and the proposed routing scheme are described in section 4 and 5, respectively. Experimental results are discussed in section 6 and section 7 concludes the paper.

#### 2. Related Work

There have been significant published works on efficient routing schemes in parallel and distributed computing areas and several on-chip routers have been proposed for NoC. Hu and Marculescu in [9] proposed the DyAD routing algorithm for on-chip networks where routers switches between the deterministic and adaptive routing algorithms according to the network's congestion condition. When the traffic is not heavy, the deterministic routing algorithm is used and when there is a heavy traffic, the adaptive routing algorithm is employed.

Shin et al. [10] proposed a hybrid switching scheme that dynamically combines both wormhole switching [11] and virtual cut-through [12] to provide higher achievable throughput compared to wormhole switching alone, while reducing the buffer space required at the intermediate nodes when compared to virtual cut-through. Nilson et al. [13] proposed a Proximity Congestion Awareness, PCA, technique to avoid congestion area. The load can be spread over a larger area by using different routing rules. PCA can be used to make the load distribution more uniform. Information to help the routers in their routing decision is sent between the routers. The informative value PCA is called stress value and is sent from one router to its neighbors in all directions. The stress value relates to the load level in that router. The surrounding routers receive four such values from their neighbors that help them to have a view of the surrounding routers' condition.

Wiklund and Liu [14] introduced the concept of packet connected circuit, PCC, where a packet is switched through the network locking the circuit as it goes. PCC is deadlock free and does not impose any unnecessary restrictions on the system while being simple and efficient in implementation. SoCBUS uses this PCC scheme to set up routes through the network.

Network on Silicon was proposed by Philips Research Laboratory as a communication network to integrate IP blocks and memory modules on the same chip which can guarantee the performance. The network must support guaranteed-throughput (GT) for real time applications and best-effort (BE) communication where throughput is not guaranteed, but no data is lost. They proposed a connection oriented crossbar based routers to build the network for GT routing [15].

An adaptive deadlock free routing algorithm, called Dynamic XY (DyXY), has been proposed in [16]. In this scheme, based on the static XY algorithm, a packet is sent either to the X or Y direction depending on the congestion condition. It uses local information which is the current queue length of the corresponding input port in the neighboring routers to decide on the next hop. It is assumed that the collection of these local decisions should lead to a near-optimal path from the source to the destination. The main weakness of DyXY is that the use of the local information, which is based on the stress value (available buffer), in making routing decision could forward the packet in a congested area as employing stress value is not sufficient.

Lotfi-Kamran et al. [17] introduced the Enhanced DyXY routing algorithm, EDXY, where the congestion information of a router is propagated to its row and column nodes via separate wires. Non-local information provided by these wires is used only when the packet is one row or column away from destination. Furthermore, Balanced Adaptive Routing Protocol (BARP) [18] is proposed for onchip networks to provide adaptive routing and ensure deadlock-free routing by evenly distributing input packets of a router among all its shortest path output ports.

In this paper, a fuzzy-based adaptive routing algorithm is proposed. It utilizes the link cost which is dynamically determined based on the present conditions of the network. Using this algorithm, the traffic is distributed through the nodes that are less congested or have a spare capacity.



Figure 2. General fuzzy system.

#### 3. Fuzzy Controller

The fuzzy controller or Fuzzy Inference System (FIS) is a fuzzy system model employing fuzzy sets to represent the semantic properties of each control and solution variable along with processes its input and output by using a set of production IF — THEN rules. The rules associate an input value, through a collection of fuzzy sets, into a new output representation.

As illustrated in Figure 2, a fuzzy controller consists of three parts: fuzzification, fuzzy inference systems, and defuzzification.

Inputs are normalized in the range of (0,1), then determine the degree to which they belong to each of the appropriate fuzzy sets via membership functions. The Fuzzification transforms the crisp value of the input variable into a fuzzy set. Fuzzy inference methods are for approximate reasoning, which represents their inherent property for approximate reasoning and for dealing with uncertainties which consist of a number of conditional IF — THEN rules that used to describe the system adequately.

Defuzzification is the process of producing a quantifiable result in fuzzy logic and converts the fuzzy control action into a crisp value. Two methods for defuzzification are widely used:

1. The Center-Of-Gravity method (COG). This method finds the geometrical centre.

2. The Mean-Of-Maxima method (MOM). This method finds the value which has maximum membership degree according to the fuzzy membership function.

In this paper most of membership function are chosen to be triangular that will be explained by detail in next section. An implication formula is used to evaluate the individual IF — THEN rules in the rule base. The implication formula provides a membership function that measures the degree of truth of the implication relation between the input and output variables [19].

## 4. NoC Platform

A 2D-mesh NoC based system is shown in Figure 1. It consists of Routers (R), Processing Elements (PE), links between routers, and Network Interfaces (NI). PEs may be intellectual property (IP) blocks or embedded memory modules. Each core is connected to the corresponding router port using the network interface. Routers have layered structured and contains address decoder, fuzzy controller, routing logic. arbitration logic. and communication ports. Communication ports directed to other routers or cores. The communication ports include input and output channels. Each router has five bidirectional ports: East, West, North, South, and Local. Each input port has a buffer to storage the incoming packet temporarily. The Local port communicates between the router and its local core. The other ports of the router are connected to neighbor routers, as presented in Figure 1.



Figure 3. The DyXY router architecture [17].

## 5. The FRA Routing Algorithm

The deterministic routing schemes such as the XY routing scheme are more susceptible to congestion as it provides no alternative choice for routing when network becomes congested. When using adaptive routing, the deadlock problem should be considered, which may be caused by packets waiting for each other in a cycle.



Figure 4. The membership functions: (a) for the time of previous packet and (b) for the number of empty spaces in neighbor's buffer.

One way to achieve deadlock free adaptive routing is achieved by using virtual channels (VCs) [20]. However, adding virtual channels does not come for free as it requires extra buffering space and complex control logic for routers. Some partial adaptive routing algorithms such as Odd-Even [21] and DyAD [9] do not use virtual channels for avoiding deadlock so that the routing algorithms need to prohibit at least one turn in each of the possible routing cycles [22] which reduce the degree of adaptiveness. FRA is based on the DyXY routing algorithm utilizing virtual channels not only to avoid deadlock but also to improve the performance.

The main difference between the XY and the DyXY routers is that, depending on the network condition, the route computation unit may select different paths at different times for the same source and destination pair. For this to happen, a pre-port selection unit is added to the router. Based on the current network conditions, pre-port selection unit selects the best candidate between adjacent ports (i.e., North vs. East, North vs. West, South vs. East, and South vs. West) and provides routing computation unit with this information (Figure 3). One of the factors for choosing an output port is the number of free buffers at the corresponding input port in the next hop. This technique has been used in the DyXY routing algorithm where the free buffer count at a downstream node is used for congestion estimation. To exchange the count, which can be considered as a stress value, some wires are added between adjacent routers. Due to the fact that in each node, packets can be routed in both X and Y directions without restriction, this routing algorithm needs a mechanism to guarantee deadlock avoidance. In networks having virtual channels (general case), usually the following method is used to guarantee deadlock avoidance. Virtual channels in Y dimension are divided into two parts. Therefore, The network is partitioned into two sub-networks called +X sub-network and -X sub-network each having half of the channels in the Y dimension. If the destination node is to the right of the source, the packet will be routed through the +X subnetwork. If the destination node is to the left of the source, the packet will be routed through the -X sub-network. Otherwise that packet can be routed using either subnetworks [23].

Routing algorithms can use many different metrics to determine the best route. Some of the most popular metrics are path length, reliability, delay, and bandwidth.

In our proposed scheme, it is assumed that the links between routers have the same transmission bandwidth and length. These assumptions are logical because the propagation delay of a traffic flow in the high performance communication is normally very small compared with its queuing delay at the routing nodes. Each router has an incoming buffer at each direction.

| Waiting time<br>Free slots | Z  | VS | S  | М | L |
|----------------------------|----|----|----|---|---|
| Z                          | L  | L  | L  | L | L |
| S                          | М  | М  | М  | L | L |
| М                          | VS | VS | S  | S | М |
| L                          | Ζ  | Ζ  | VS | S | М |

Table1. Fuzzy Rule Base.

FRA provides a new paradigm for NoC router design using the fuzzy controller to reach larger performance gain than the other routers. As explained above, the fuzzy controller has three parts to select a better route for packets, fuzzification, inference engine, and defuzzification. The membership functions for the fuzzy set are chosen to be triangular. Since the router is an adaptive router, there can be more than one output direction to route packets. The crisp inputs are the number of empty spaces in neighbor's buffer (free slots) and the waiting time of the previous packet. These two factors are the inputs of fuzzy controller of each port. The output of fuzzy controller is the link cost of the output port that is used in controller logic to select between two possible paths for sending packets.

The range of membership function for the number of empty spaces in neighbor's buffer is 0 to 8 and for waiting time of previous packet is 0 to 32. Membership functions are shown in Figure 4(a) and (b). To describe the fuzzy rules, we use Z, VS, S, M, and L to indicate "Zero", "Very Small", "Small", "Medium", and "Large" respectively. The rules of the fuzzy controller (or FIS) are designed for an optimal performance. Table 1 shows the rule base for the fuzzy controller. The centroid defuzzification method [24][25] is used to produce crisp value to choose adequate route. The architecture of the FAR router is illustrated in Figure 5. For instance, if a router is located in location "10" (Figure 1) receives a packet with destination "01", the proposed routing scheme, first decides to send the packet to either "11" or "00" router. To identify which of them is more efficient, the link costs of two possible paths, which computed by the presented fuzzy controller, are compared and the path with lower link cost will be chosen as appropriate path.



Figure 5. The FAR router architecture.

## 6. Experimental Results

To assess the proposed adaptive routing algorithm, we compare the presented scheme with DyXY under two types of traffic, synthetic and multimedia traffics. We have developed a synthesizable wormhole NoC simulator implemented in VHDL to evaluate the efficiency of FRA. This simulator can be used for wormhole switching in twodimensional mesh configuration. The simulator inputs include the array size, the routing algorithm, the link width, buffer size, and the traffic type. For all routers, the data width was set to 32 bits, and each input channel has a buffer (FIFO) size of 8 flits. For the performance metric, we use the latency defined as the number of cycles between the initiation of the message operation and the time when the tail of the message reaches the destination. Furthermore, the delays of computing resources for each component in a router including the fuzzy system have been considered in the results.

#### 6.1. Synthetic Traffic

The proposed scheme is evaluated on various load condition in a  $10 \times 10$  mesh topology compared with conventional DyXY routing scheme. Two different synthetic traffic profiles are chosen, uniform and hotspot. In the uniform traffic model, each IP core or PE sends a packet to any other node with equal probability while in the hotspot traffic model, a PE receives an extra portion of traffic rather the other nodes (here we assume the node 5,5 receives 10% more traffic).

To evaluate the efficiency of the presented scheme, different packet size, 10, 50, 100, and 200, are used in the

simulation results. As illustrated in in Figure 6(a) and (b), the proposed scheme has lower delay compared to the DyXY scheme under both traffic profiles. The performance gain of the proposed scheme for each packet size is around 20% and 30% under the uniform and hotspot traffic models, respectively.

### 6.2. Multimedia Traffic

FRA is also evaluated under two realistic case studies mapped onto a  $3\times4$  mesh topology. We select two different video processing applications: Video Object Plane Decoder (VOPD) and MPEG4 decoder [26]. Figure 7 and Figure 8 depict that the VOPD and MPEG4 Decoder block diagrams with communication bandwidth in MB/sec are mapped onto  $3\times4$  mesh topologies, respectively. As shown in Figure 9, FRA decreases the latency considerably where the performance gain is up to 24% and 25% under the MPEG and VOPD traffic profiles, respectively, compared with the DyXY routing algorithm.

#### 6.3. Hardware Overhead

For appraising the area overhead of the router utilizing the proposed routing controller, the router was synthesized by Synopsys D.C. using the UMC 90nm technology with an operating point of 1GHz and supply voltage 1V. We performed place-and-route, using Cadence Encounter, to have precise area estimations. Comparing the area cost of the router using DyXY to the router employing FAR, indicates that the hardware overhead of implementing a router using the proposed adaptive routing algorithm is less than 5% and that can be considered negligible.

## 7. Conclusion

We propose an adaptive routing algorithm for on-chip networks employing fuzzy logic mechanism. The number of empty spaces in neighbor's buffer and waiting time for previous packet are chosen as two inputs of the fuzzy controller while the link cost is the output of the fuzzy controller. We have evaluated the proposed routing algorithm under both synthetic and real application traffic profiles. The results reveal that the proposed scheme improves the performance significantly.

#### References

- [1] L. Benini, G. De Micheli, "Networks on Chips: A New SoC Paradigm," IEEE Computer, pp. 70-78, January 2002.
- [2] A. Jantsch and H. Tenhunen, "Networks on Chip," Kluwer Academic Publishers, 2003.
- [3] W. J. Dally and B. Towles, "Route Packets, Not Wires: On-Chip Interconnection Networks," In Proc. of Design Automation Conference (DAC), pp. 684-689, June 2001.
- [4] A. Hemani, A. Jantsch, S. Kumar, A. Postula, J. Oberg, M. Millberg, and D. Lindqvist. "Network on chip: An architecture for billion transistor era," In Proc. of the IEEE Norchip Conf., pp. 120-124, November 2000.
- [5] S. Kumar, A. Jantsch, M. Millberg, J. Oberg, J. Soininen, M. Forsell, K. Tiensyrj, and A. Hemani. "A network on chip architecture and design methodology," In Proc. Symposium on VLSI, pp. 117-124, April 2002.
- [6] W. J. Dally, and B. Towles. "Route packets, not wires: on-chip interconnection networks," Design Automation Conference, pp. 685-689, June 2001.

- [7] A. Hemani, A. Jantsch, S. Kumar, A. Postula, J. Oberg, M. Millberg, and D. Lindqvist. "Network on a chip: an architecture for billion transistor era", In Proc. of the IEEE NorChip Conf., Nov. 2000.
- [8] S. Kumar, et al. "A network on chip architecture and design methodology," In Proc. of ISVLSI, pp. 117-124, April 2002.
- [9] J.Hu, R.Marculescu "DyAD Smart Routing for Networks-on-Chip", Design Automation Conference, pp. 260 – 263, 2004.
- [10] K. G. Shin and S. W. Daniel, "Analysis and implementation of hybrid switching", IEEE Tran. on Computers, pp. 684-692, 1996.
- [11] William J. Dally, Charles L. Seitz, "The torus routing chip", Distributed Computing, pp.187-196, 1986.
- [12] P. Kermani and L. Kleinrock, "Virtual cut-through: a new computer communication switching technique" In Computer Networks, volume 3, pp. 267-286, Sept. 1979.
- [13] E. Nilsson, M. Millberg, J. Oberg, and A. Jantsch, "Load distribution with the Proximity Congestion Awareness in a Network on Chip", In Proceedings of the Design Automation and Test in Europe Conference and Exhibition, 2003.
- [14] D. Wiklund and D. Liu, "SoCBUS: Switched Network on Chip for Hard Real Time Embedded Systems," Int'l. Parallel and Distributed Processing Symposium, pp. 78-84, April 2003.
- [15] Edwin Rijpkema, Kees Goossens, and Paul Wielage, "A Router Architecture for Network on Silicon", In Proceedings of Progress Workshop on Embedded Systems, 2001.
- [16] M. Li, Q.-A. Zeng, W.-B. Jone, "DyXY a proximity congestionaware deadlock-free dynamic routing method for network on chip," in: Proceedings of the Design Automation Conference, 2006, pp. 849–852.
- [17] P. Lotfi-kamran, A. Rahmani, M. Daneshtalab, A. Afzali-Kusha, Z. Navabi, "EDXY - A Smart Congestion-Aware and Link Failure Tolerant Routing Algorithm for Network-on-Chips," Journal of

Systems Architecture (JSA-elsevier), Vol. 56, No. 7, pp. 256-264, 2010.

- [18] P. Lotfi-Kamran, M. Daneshtalab, Z. Navabi, and C. Lucas, "BARP-A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs to Avoid Congestion," in Proceedings of 11th ACM/IEEE Design, Automation, and Test in Europe Conference (DATE), pp. 1408-1413, Mar 2008, Germany.
- [19] Earl Coax, "The fuzzy system handbook: A Practitioner's Guide to Building, Using, and Maintaining Fuzzy Systems"
- [20] J. Duato. New theory of deadlock-free adaptive routing in wormhole networks. IEEE Transaction on Parallel and Distributed Systems, Vol 4, No 12, pp. 1320-1331, Dec. 1993.
- [21] G.M. Chiu, The odd-even turn model for adaptive routing, IEEE Transactions on Parallel and Distributed Systems 11 (July) (2000), pp. 729–738.
- [22] C. J. Glass and L. M. Ni, "The turn model for adaptive routing", International Symposium on Computer Architecture, pp. 278-287, May 1992.
- [23] L.M. Ni, P.K. McKinley, A survey of wormhole routing techniques in direct networks, IEEE Computer (1993) 62–76.
- [24] M. O. Centre, M. Oussalah, V. Kreinovich, "A New Derivation of Centroid Defuzzification," in *Proc. of the 10th IEEE International Conference on Fuzzy Systems FUZZ-IEEE'2001*, Australia, December 2-5, Vol. 2, pp. 884-887, 2001.
- [25] A. Chandramohan, M. Rao, M. S. Arumugam, "Two New and Useful Defuzzification Methods Based on Root Mean Square Value," Journal Soft Computing - A Fusion of Foundations, Methodologies and Applications, Vol. 10, No. 11, pp. 1047-1059, 2006.
- [26] E.B. Van der Tol and E.G.T. Jaspers, "Mapping of MPEG-4 Decoding on a Flexible Architecture Platform," Proc. SPIE 2002, pp. 1–13, Jan. 2002.



(a)

(b)

Figure 6. Simulation Results under the (a) random and (b) hotspot traffic profiles with different packet sizes.



Figure 7. (a) VOPD block diagram, with communication BW annotated (in MB/s) and (b) its mapping onto mesh topology [26].



Figure 8. (a) MPEG4 decoder block diagram, with communication BW annotated (in MB/s) and (b) its mapping onto mesh topology [26].



Figure 9. Simulation results under two multimedia traffic profiles: MPEG and VOPD.