Skip to main content

How to speed up data transfer between servers

This article contains values as an example of a specific case. Your servers may have different configuration. Please use accordingly.

Over a network, although the network bandwidth could be 10 GigaBits, yet, if the servers have
1 GigE Ethernet network cards (1 GigaBit), the overall file-transfer
throughput is 1 GigaBits / sec, which means (1024*1024*1024 / 8) bytes
per sec = 125 MB/sec. This is the maximum throughput available on the network.


  • For convenience, the 2 esrvers we need to improve data transfer speeds between - we'd call one as source, & other target. Now, if we ping from a source to target server, we get about 42 ms of turnaround time.






  • This gives us maximum amount of data at any point of time possible on this network as (125 MB / s) * (0.042) = 5.250 MB . However, the default network level settings on the servers don’t let this much bandwith to be used. We may verify it with “ndd & route” commands (only SA can run these commands):



  • If you have root access to any locale Linux box, you may try "ndd get ..." & "route print" commands, otherwise ask the SA to provide you the outputs, to find out the current settings.



  • The **recv** & **xmit** (receive & transmit) values are the default ones, used by the server & maximum values are defined by **cwnd** & **max_buf. e.g. The 49152 value indicates that this server (or any RMAN job) would send / receive, by default, 48 KB of traffic only. Moreover, if someone is manually writing a program, may use the maximum of 1048576 bytes
    = 1 MB, by using the **max** variables)



  • Now, our purpose is to be able to use the whole 5.25 MB that can be used on the current
    connection. However, the allowed values for these 4 variables, should be a power of 2. Hence, we would pick the next higher power of 2, which is 8 MB (=2 ^ 3). We would need to set the **max** limit variables also equal to 8 MB.



  • DBA should ask SA to run these commands for setting the values, with “set” option, instead of “get”. E.g. “sudo ndd -set /dev/tcp tcp_recv_hiwat 8388608”



  • The “ndd” commands get / set values for ALL connections. We may not want to do that everytime,
    because if a non Netbackup / media server (say, a windows client) connects to this server, which has a lower capacity of being able to handle traffic (say, 1 MB), then the 5 MB packets would overwhelm the connection. Hence,
    to avoid this scenario, we should use “route” commands to set connections settings between specific IP’s , which don’t work the same as “ndd”.



  • It’s always safer to use “route” settings instead of
    “ndd”
    excepttcp_max_buf”. Here are some screenshots (please esarch internet to see more detailed syntax of route / ndd commands) :






  • The “sendpipe” option is same as “tcp_xmit_hiwat” of ndd & recvpipe is “tcp_recv_hiwat”.
    The “ssthresh” is same as “tcp_cwnd_max”. However, the “tcp_max_buf” still needs to be set through “ndd”.



  • With "route" settings, all these settings will be wiped out & will go back to default whenever a server is rebooted, so changes are temporary till next boot. With "ndd', it may be made permanent, which may be unnecessary, unless it's between some DB & RMAN backup server (viz. Netbackup) where there's a regular transfer of data.



  • "ndd" uses get / set options to show or edit values, whereas "route" uses print / add / delete options.



  • Now, having understood all the above concepts, when a DBA / SA is ready to start working on their specific environment of servers, you may follow this :

    1) 1st we need to get the current values from the SA, of the 4 parameters (to be executed from source / target server),
    a. ndd –get /dev/tcp tcp_recv_hiwat
    b. ndd –get /dev/tcp tcp_xmit_hiwat
    c. ndd –get /dev/tcp tcp_max_buf
    d. ndd –get /dev/tcp tcp_cwnd_max
    e. route get

    2) For “route get” output, these parameters are named sendpipe, recvpipe, ssthresh (there’s no equivalent of tcp_
    max_buf here – it can be set only through ndd command). “ndd” sets the parameters universally, for all IP’s, whereas “route” can be used to set between specific IP’s & temporarily.

    3) Along with this we need to know the server Ethernet card types (on both source & target servers) , viz. 1 GigE, 10GigE ; AND the network bandwidth , viz. a 10 Gbps line, 24 Mbps line etc.

    4) Then we need to make calculations based on the above network bandwidth values, taking into the consideration, an example calculation on my Blog - http://databaseoracle.blogspot.com/2009/06/how-to-speed-up-data-transfer-between.html .

    5) This way, we’d arrive at a number, which is the total bandwidth that CAN BE available between the 2 servers under consideration. We’ll call this number (if it is a power of 2) or the next power of 2, MAX-BANDWIDTH

    6) Then we request the SA to set the above parameters to new values (use route command, instead of ndd , to set these values temporarily) :
    a. route add –host … -sendpipe MAX-BANDWIDTH -recvpipe MAX-BANDWIDTH -ssthresh MAX-BANDWIDTH
    b. ndd –set /dev/tcp tcp_cwnd_max MAX-BANDWIDTH (this parameter is available only through ndd & SA may decide whether to set it or not, based on implications) .

    7) If the SA is responding in minutes, this whole calculation would take about 10-20 minutes.

    8) Please let me know in case of any doubts & your experience when you try these modifications.

    9) Happy transferring lightning-fast data !










    Popular posts from this blog

    Permitting a user on UNIX / Linux to use sqlplus

    We encountered an interesting scenario worth mentioning.

    Scenario:
    Oracle is installed on a Linux server newly & another ftp user wants to just use sqlplus also.Just setting the ORACLE_HOME, LD_LIBRARY_PATH, ORACLE_SID, PATH in the user's profile did not let him run the "sqlplus" command at all. The error shown was :
    $ sqlplus
    ld.so.1: sqlplus: fatal: /app/oracle/product/10.2/lib/libsqlplus.so: Permission
    denied
    Killed
    The files under "lib" directory under home didn't have permission for anybody other than "oracle" user. -rwxr-x--- 1 oracle oinstall 1320016 Jun 28 2005 /app/oracle/product/10.2/lib/libsqlplus.so
    When we permit the other users (other than oracle) to read/execute…

    HWM (high watermark) of a tablespace ?

    How many times have you thought of calculating HWM (high watermark) of a tablespace as a DBA ? Have you ever found out that a tablespace is overallocated in a database, as compared to the data being used in it ? In such a case, you'd want to reduce the wasted space & resize the tablespace to a smaller size, although greater than the current usage.
    HWM is at the maximum blocks ever occupied by a SEGMENT (unless we perform a truncate on it). Segments span over datafiles however remain within a tablespace. All segments are physically allocated by way of extents, which are contiguous blocks of data. Now, when we talk about contiguous blocks, 2 datafiles (even from same tablespace) are highly unlikely to have contiguguos data blocks. Hence, theoretically, it's possible to calculate high watermark within a datafile, not a tablespace directly, as the latter is only a logical unit. Although, we can resize a tablespace, however the resize value needs to be based on exact calculati…

    Intereseting --- 249 days bug for Oracle on Linux servers

    --contributed by Manjunath Ulthi, Senior DBA at Infosys Technologies limited

    1 Introduction

    Problem: OCI client spins when machine uptime >= 249 days in Linux machines. Once the uptime crosses 249 days the database client hangs and leads to high CPU utilization and server hung state. The user will not be able to connect to the databases This is oracle known bug 4612267.


    Solution: This can be handled by applying the patch 4612267.


    The patch has been applied in the servers server126, server041 & server120 and tested. After the patch has been applied the client connections were normal after 249 days and all the operations of databases/applications were normal and didn’t encounter any problems.

    For further information refer the Metalink Notes

    Note: 338461.1: SQL*Plus With Instant Client 10.2.0.1 Hangs, When System Uptime Is More Than 248 Days

    Note:4612267.8 : Bug 4612267 - OCI client spins when machine uptime >= 249 days

    2 General Instructions

    Procedure to apply the patch

    1. Download the …