book
Article ID: KB0093416
calendar_today
Updated On:
Description
Resolution:
This Article -- Best Practices for Running TIBCO Rendezvous on Solaris OS, if you can not be found on the web, the following is the copy
============================================
Best Practices for Running TIBCO Rendezvous on Solaris OS.
By Amol Khire, June 2004
Introduction
TIBCO Rendezvous is widely used in the industry for mission-critical, real-time messaging. Typically, the performance and reliability requirements of applications using Rendezvous can be demanding and hence may stress the underlying operating system, as well as the hardware platform. This could make it necessary to fine-tune the system to maximize performance and availability in the most demanding environments.
This document explains best practices for running TIBCO Rendezvous on the Solaris Operating System (OS). As always, best practices that are recommended today may outlive their usefulness as newer and better versions of the applications, OS, and hardware become available. It is therefore imperative to understand the underlying rationale for setting a particular parameter. The second section of this document attempts to explain the inner workings of the Solaris OS. This should help the user understand the significance of a particular parameter and make a more informed decision with respect to its use.
Background
Rendezvous is a real-time message-oriented middleware application. In the Solaris OS, a real-time thread has a higher priority than any system or time-share thread, and real-time threads take priority for getting scheduled on the CPU. For optimal performance, it is important that the real-time application threads have access to adequate processing power. Performance characterization and analysis was done on Solaris platforms running Rendezvous to suggest best practices that can be used for optimal performance.
Suggested Best Practices
Please note: The suggested tuning parameters assume that the underlying operating system is the Solaris 8 OS or later.
Best Practices for Minimizing UDP Packet Drops
Rendezvous uses the UDP Protocol for its messaging. Under certain conditions it is possible that UDP packets may be dropped at the system receiving packets over the network. Packet drops can manifest in various ways, some of which may not be obvious to the user. Some examples are UDP packet drops because the UDP buffer is filling up, or packet drops at the network interface card (NIC) driver. The following tuning parameters help minimize the packet drops on Solaris OS platforms.
1. Creating a Processor Set and Disabling Interrupts on That Set
In the Solaris OS, interrupts have the highest scheduling priority (higher than real-time threads). This ensures that high-priority system events are handled immediately and in a deterministic way. The interrupts can come from various sources; some of them are network interrupts by the NIC driver when packets arrive over the network, and some are disk interrupts for reads/writes to the disk. If the interrupt lands on the CPU where the UDP packet receiving thread was running, it will interrupt the UDP receiver thread. If this occurs too frequently, it will result in the UDP receiver thread getting less of the CPU than it needs for its packet-processing requirements. Having less processing power means that the underlying UDP buffer will fill up, since packets will not be draining out fast enough, eventually resulting in packets being dropped.
How to Prevent Interrupts From Interrupting Real-Time Threads
Creating a Dedicated Processor Set for Rendezvous Daemon
Create a processor set with two CPUs and bind the Rendezvous daemon process to the processor set. Also, disable interrupts on the CPUs in this processor set so that no interrupts land on the processors in this set. However, note that creating a dedicated processor set for a process means that no other processes can run on these processors.
Simply put, if the Rendezvous daemon is not using the processors in the processor set, the CPU power cannot be used for any other process. This may present a hurdle when sizing the systems to maximize the available CPU power. Also this is only applicable when more than two processors are available in the system and when at least two can be dedicated for the processor set; for example, for a machine with more than two CPUs online.
a. Obtain the available CPU IDs (only CPUs in the on-line or no-intr state are available for creating the processor set).
# psrinfo
b. Create a processor set with two CPUs (from the ones that are available).
# psrset -c <CPUID_A> <CPU_ID_B> gives the ID for the processor set created.
c. Disable interrupts on this processor set:
# psrset -f <PSRSET_ID>
d. Bind the Rendezvous daemon processID to this processor set.
# psrset -b <PSRSET_ID> <RVD_PID>
This will ensure that the Rendezvous daemon gets a dedicated processor set with two CPUs and that the interrupt threads are not scheduled on the same CPU as the Rendezvous daemon threads.
2. Device Driver Tuning Parameters
If it is not possible or feasible to create a processor set, we can minimize the effect that the interrupts have on scheduling real-time threads. Typically a system running a messaging application will be receiving a lot of packets over the network. The NIC driver raises an interrupt for servicing the packets over the network. The device driver configurable parameters can be tuned for by reducing the number of interrupts generated as well as by tuning the driver FIFO so that more packets are queued in the buffers.
3. Tuning Parameters for the GigaSwift Ethernet (Cassini) Driver
Increase the FIFO Depth
In /etc/system increase the FIFO depth by setting:
set ce:ce_srv_fifo_depth=8192
This will reduce packet drops at the NIC by buffering more packets.
Interrupt Coalescing
Increase the timeout for the interrupt timer to fire. This results in fewer interrupts being generated.
# ndd -set /dev/ce rx_intr_time 30
Here rx_intr_time is specified in units of system clock divided by 512.
4. Do Not Use /proc Tools and System Monitoring Tools That Use /proc Tools
The proc tools like pmap, pfiles, and pstack stop the processes while examining them. Stopping the Rendezvous daemon, even for a brief period, can result in large packet losses.
Suggestions:
Do not use proc tools like pmap, pstack, and pfiles.
Limit the use of tools and utilities that read the /proc files, such as top, prstat, and ps.
5. UDP Tuning Parameters
Increase the udp max buffer size to 1 Mbyte (the default of 256 Kbyte is too low, especially for a Gigabit Ethernet network). The ndd tuning parameter for this is udp_max_buf. Set it to 1 Mbyte.
# ndd -set /dev/udp udp_max_buf 1048576
6. fsflush Tuning Parameters
fsflush is a Solaris OS kernel daemon that periodically scans through memory and flushes the modified pages in the file system cache to the disk. At each pass of the fsflush daemon, if there are a large number of dirty pages in the File System cache, fsflush will cause increased disk activity while flushing the pages to disk. This in turn will cause increased disk interrupts, which may interrupt the UDP receiver thread. In such a case it's best to reduce the interference caused by file system flushing by the fsflush daemon.
Add the following tuning parameters to the /etc/system file.
set tune_t_fsflushr = 5
set autoup = 300
The tune_t_fsflushr parameter specifies the time interval with which the fsflush daemon wakes up to flush modified pages to disk. In this case the fsflush daemon will examine tune_t_fsflushr/autoup (or 5/300) file system pages and flush the modified ones to the disk every tune_t_fsflushr seconds.
Best Practices for Maximizing Performance
1. TCP Watermarks
Applications use TCP/IP protocol to send and receive data to and from the Rendezvous daemon. Setting the TCP high and low watermarks to an optimum value can achieve a higher throughput (messages/sec) between the Rendezvous daemon and the applications. The performance improvement using these parameters is dependent on the message size, and the user should try out one of these recommended values for the best performance. For applications sending messages that vary in size, a setting of 64 Kbyte appears to be a good value.
Setting the Tuning Parameter:
# ndd -set /dev/tcp tcp_xmit_hiwat 65536
# ndd -set /dev/tcp tcp_recv_hiwat 65536
2. tcp_maxpsz_multiplier
When an application sends a packet over the wire there is a cost involved in copying the packet from the user to the kernel buffer before it can be sent. This parameter specifies how many maximum segment size (MSS)-sized packets should be copied over. Increasing this parameter will mean that we are doing fewer copy operations with more data being copied per operation.
# ndd -set /dev/tcp tcp_maxpsz_multiplier 10
Solaris Internals
Solaris OS Global Priority Model
The Solaris OS kernel schedules threads according to their global priorities (see Figure 1). The higher-priority threads take precedence over lower-priority threads when it comes to scheduling them for a slice of the CPU. The threads are assigned to different scheduling classes, such as time-share (TS), system (SYS), and real-time (RT), among others. Depending on their priorities, the threads are scheduled on the available CPUs.
Figure 1: Global Priorities (source: Chien Yen)
Figure 1: Global Priorities (source: Chien Yen)
What Are Interrupts?
An interrupt is a mechanism used by the peripheral devices to inform the CPU of task completion, error conditions, and other events that require immediate attention from the CPU. Interrupts are generated asynchronously and are processed in the interrupt context. In the Solaris OS, interrupt threads have the highest priority, higher than threads in any other scheduling classes (see Figure 1). For example, whenever packets are received over the network, the NIC generates interrupts.
How Do Interrupts Affect Rendezvous Performance?
An interrupt is generated whenever an event requiring the CPU's immediate attention occurs. Since interrupts have a higher scheduling priority than any other scheduling class, if the application's real-time thread is scheduled on a CPU that is getting lot of interrupts, the real-time thread will be interrupted frequently.
When a thread that is currently executing on a CPU is interrupted, the kernel avoids context-switching out the interrupted thread. The interrupt (running as a thread) borrows LWP from the executing thread and the interrupted thread is said to be "pinned" to the CPU. The interrupted thread cannot run on any other processor until the interrupt handler completes or blocks on a synchronization object. If the interrupt load on the CPU running the real-time thread is high, the thread will be "pinned" frequently (see Solaris Internals in the References section). Also, if the real-time thread is always "runnable," the scheduler will not migrate the real-time thread away from the "interrupt prone" CPU to a CPU that is not handling interrupts or is idle.
All these factors add up; together they may result in the real-time thread getting less than its fair share of the CPU and eventually dropping packets.
Device Driver Internals
Apart from interrupts, there is also a possibility of the Gigabit Ethernet device driver dropping packets after they arrive at the network interface. These drops do not show up as the udpInOverflows statistic and hence may go unnoticed. First, let us understand how the packet is delivered, from the moment it arrives over the network to the point when it is delivered to the application socket.
1. When a packet arrives over the network, an interrupt is generated. The interrupt thread then interrupts the currently executing thread on the CPU (that is, the CPU which is assigned to handle this particular interrupt). For some Gigabit Ethernet Drivers (like that for the GigaSwift Ethernet Card) the interrupt thread puts the packet in a FIFO (for systems having four CPUs or more).
2. The device driver "worker threads" (which run in the system, that is, the SYS priority class) then pick up the packets from the FIFO and deliver it "upstream" to the next module in the networking stack (as shown in Figure 2).
3. The UDP receiver thread reads the packet off the socket buffer.
Figure 2: How a Packet Is Processed
Figure 2: How a Packet Is Processed
Note: All the above threads may run on the same or different CPUs depending on how they are scheduled. The preceding diagram assumes that they are competing for a time-slice of the same CPU.
Now that we know how the packet is processed, let us examine the scenarios that may lead to packet drop.
Scenario 1:
It is possible that the device driver worker threads do not get enough of the CPU to drain the packets from the FIFO. Meanwhile, the interrupt thread will continue to receive the packets off the network and put them in the FIFO. In such a scenario the FIFO will become full and will show up as rx_pkts_dropped statistics being incremented in the kernel statistics (kstat) of the GigaSwift Ethernet Device driver. The solution here is to increase the FIFO depth so that more packets can be queued before dropping.
In /etc/system file, increase the FIFO depth by setting:
set ce:ce_srv_fifo_depth=8192
Scenario 2:
If interrupts land on the CPU that is running the real-time UDP receiver thread, they will interrupt the real-time thread. If this happens too frequently, it may result in the real-time thread getting less of the CPU time-slice than is necessary for its packet-processing requirements. This may result in the UDP buffer filling up, eventually leading to packet drops. To minimize the effect of the network interrupts, we can reduce the number of interrupts generated by setting the device driver tuning parameter rx_intr_time to a higher value.
Use ndd to set this tuning parameter
# ndd -set /dev/ce rx_intr_time 30
Note: rx_intr_time is specified in units of system clock divided by 512.
Tips for Sizing and Capacity Planning
Effect of Small Packet Traffic on Performance
Messages sent by the application are transmitted as packets over the network. The maximum size of a packet that can be sent over the wire is called a maximum transmission unit (MTU).
Irrespective of the network bandwidth, the CPU consumption limits the number of packets that can be sent over the wire. This should be considered while sizing and capacity planning, as shown in the following example. Assume that the MTU is 1500 bytes. Also assume that the user, Mary, is sending a message of 1500 bytes (including UDP/IP/Ethernet headers) over the wire. If Mary can send 10,000 messages/sec of size 1500 over the wire, she may assume that she can send roughly the same messages of size 1501 (MTU+1). However, since the message size now exceeds the MTU size, the message (of size 1501) will be split into two packets over the wire. This will result in a significantly lower throughput for messages of this size, as compared to messages that fit in a MTU-sized packet.
Upgrading to a Gigabit Network
Avoid upgrading to a Gigabit Ethernet unless the application performance is unacceptable at lower network bandwidth. In the case of a Gigabit Ethernet network, ensure that the message traffic is not too "bursty." Here's an example to help illustrate this issue: An application sending 100 megabits of data can send the data over a gigabit network as a "burst" lasting for 0.1 second or over a 100 Mbit/sec network in one second. For the gigabit network, the transmission rate is 1 Gbit/sec, while for the 100BaseT network the transmission rate is 100 Mbit/sec. However, if "averaged" over a period of one second, in both the cases the transmission rate is 100 Mbit/sec. If the receiver on the gigabit network is not capable of handling the sudden burst of data (at 1 Gbit/sec), the receiver may drop packets; meanwhile, the receiver on a 100BaseT network may be able to handle the traffic.
Other Factors
Factors like the capacity and buffers for the switch or router that is handling the packets also play an important part. This is especially true for IP-multicast traffic -- and not all switches or routers may have the capacity to handle large volumes of IP-multicast traffic effectively.
References
1. Solaris Internals by James Mauro and Richard McDougall
2. Solaris Processor Sets Made Easy by Dr. Matthias Laux
Acknowledgments
Thanks to Bob Sneed, Roch Bourbonnais, and Paul Riethmuller of the Performance and Availability Engineering group at Sun Microsystems for their analysis and suggestions for tuning parameters. Thanks also to Chien Yen for his diagram of global priorities.
About the Author
Amol Khire is a Sun Microsystems engineer working with independent software vendors (ISVs) on improving the performance, scalability, and availability of their applications on Sun platforms. His interests include real-time messaging, RFID, and enterprise application integration (EAI) technologies.
Issue/Introduction
Best Practices for Running TIBCO Rendezvous on Solaris OS.