TIBCO FTL® is designed to provide a flexible approach to data distribution and resiliency. With TIBCO FTL clients have the ability to seamlessly communicate with other clients both persistently and non-persistently. By design TIBCO FTL decouples how the data is transmitted from the application providing the ability for application developers to focus on what data is of interest,, not on how that data is distributed.
One of the challenges this presents however is that applications built to leverage FTL as a communication infrastructure can be distributed across multiple data-centers and locations around the globe. Administratively, TIBCO FTL has built in mechanisms to validate and coordinate that all connections in the system are maintained and valid, however some application infrastructures need to have more insight into how these validations occur to understand how the infrastructure will operate in the event of communication loss between disparate locations.
In the TIBCO FTL Realm there are two parameters that can be configured and maintained that affect how FTL manages connections. These parameters can be found under the Realm Properties heading of the Realm GUI and they manage the interaction from client to server (Client -> FTL Server) and from server to client (FTL server -> Client).
When looking purely at the interaction between an FTL client application and the FTL server these parameters are fairly intuitive:
The Client -> FTL Server Heartbeat is the interval at which a client application sends a heartbeat to the FTL server process when no data flow is present. If data is flowing between the client application and the FTL server this data can be used to reset the heartbeat. Only when no data flow is present does a heartbeat need to be sent to validate the connection is still valid. By default the client -> FTL server heartbeat is set to 60 seconds which means that when no data flow is present a heartbeat message is sent to validate the connection from the client application to the FTL server every 60 seconds.
The Client -> FTL Server Timeout is the amount of time for which a client application will wait for a response to the heartbeat sent to the FTL server process. If the client application sends a heartbeat and no response is seen in Timeout time the client application will close the existing connection to the FTL Server and trigger the FTL client reconnection logic. By default the Client -> FTL Server Timeout is set to 180 seconds (2.5 times the heartbeat interval). This default means that the client can miss at most 2 heartbeats before triggering the client to close the connection and start the retry logic.
The FTL Server -> Client Heartbeat is the interval at which a FTL server sends a heartbeat to the client application process when no data flow is present. If data is flowing between the FTL server and the client application this data can be used to reset the heartbeat. Only when no data flow is present does a heartbeat need to be sent to validate the connection is still valid. By default the FTL server -> client heartbeat is set to 60 seconds which means that when no data flow is present a heartbeat message is sent to validate the connection from the FTL server to the client application every 60 seconds.
The FTL Server -> Client Timeout is the amount of time for which the FTL server will wait for a response to the heartbeat from the client application process. If the FTL server sends a heartbeat and no response is seen in Timeout time the FTL server will close the existing connection to the client application. Unlike the Client -> FTL Server Timeout, the FTL Server -> Client Timeout does not trigger re-connection logic by the FTL Server. This is by design as the FTL Server does not drive connection initialization and leaves the connection establishment up to the client applications. The FTL Server does however use this timeout to reap connections that are no longer valid. By default the FTL Server -> Client Timeout is set to 3600 seconds (60 times the heartbeat interval). This default means that the client can miss at most 60 heartbeats before triggering the client to close the connection. By design this Timeout value from FTL Server -> Client is quite large (1 hour by default) to allow for the FTL server to maintain connections with transient clients as needed. Administrators can reduce this timeout if the desire is to have the FTL Server reap connections quicker.
The second layer that needs to be taken into consideration is the interaction between FTL server clusters. For operations like persistent store forwarding and cluster to cluster communications the same Heartbeat and Timeouts are needed to validate the health of the connections between the clusters. FTL Server clusters leverage the Client -> FTL Server Heartbeat and Client -> FTL Server Timeout values for all of the connection validation logic between clusters. So FTL Server clusters act like client applications and by default send a heartbeat every 60 seconds when no data is flowing and will timeout the connection if a responses from the other cluster is not seen in 180 seconds, meaning a cluster can miss at most 2 heartbeats before triggering the reconnection logic to re-establish the cluster to cluster connections.
The last thing to note is that when dealing with client applications both the FTL Server and the client application need to send Heartbeats so that each side of the connection can determine if the socket connection that it holds is valid. While, FTL Servers do not try to reconnect to FTL clients the FTL Server needs to send Heartbeats to the clients to validate the connections that are establish are valid. FTL clients use the Client -> FTL Server heartbeats to provide status of the connection to the FTL Server and in the event that this connection is no longer available the client will try to reconnect to the FTL Server based on client configuration. For FTL Server cluster connection validation since each side of the cluster is validating the connection based on the Client -> FTL Server Heartbeat and Client -> FTL Server Timeout each side also has native reconnection logic built into the cluster. This allows for faster detection and recovery when connection loss occurs. When a connection loss occurs the FTL Server retry logic is designed to immediately try to re-establish the connection, if the connection cannot be establish the cluster will continue to try to re-establish the connection every 2 seconds.