Science and technology

A distributed database load-balancing structure with ShardingSphere

Apache ShardingSphere is a distributed database ecosystem that transforms any database right into a distributed database and enhances it with information sharding, elastic scaling, encryption, and different capabilities. In this text, I show construct a distributed database load-balancing structure primarily based on ShardingSphere and the affect of introducing load balancing.

The structure

A ShardingSphere distributed database load-balancing structure consists of two merchandise: ShardingSphere-JDBC and ShardingSphere-Proxy, which may be deployed independently or in a hybrid structure. The following is the hybrid deployment structure:

(Wu Weijie, CC BY-SA 4.0)

ShardingSphere-JDBC load-balancing answer

ShardingSphere-JDBC is a light-weight Java framework with extra companies within the JDBC layer. ShardingSphere-JDBC provides computational operations earlier than the applying performs database operations. The utility course of nonetheless connects on to the database by means of the database driver.

As a consequence, customers haven’t got to fret about load balancing with ShardingSphere-JDBC. Instead, they’ll give attention to how their utility is load balanced.

ShardingSphere-Proxy load-balancing answer

ShardingSphere-Proxy is a clear database proxy that gives companies to shoppers over the database protocol. Here’s ShardingSphere-Proxy as a standalone deployed course of with load balancing on prime of it:

(Wu Weijie, CC BY-SA 4.0)

Load balancing answer necessities

The key level of ShardingSphere-Proxy cluster load balancing is that the database protocol itself is designed to be stateful (connection authentication standing, transaction standing, Prepared Statement, and so forth).

If the load balancing on prime of the ShardingSphere-Proxy can not perceive the database protocol, your solely possibility is to pick out a four-tier load balancing proxy ShardingSphere-Proxy cluster. In this case, a particular proxy occasion maintains the state of the database connection between the shopper and ShardingSphere-Proxy.

Because the proxy occasion maintains the connection state, four-tier load balancing can solely obtain connection-level load balancing. Multiple requests for a similar database connection can’t be polled to a number of proxy situations. Request-level load balancing will not be attainable.

This article doesn’t cowl the small print of four- and seven-tier load balancing.

Recommendations for the applying layer

Theoretically, there isn’t any practical distinction between a shopper connecting on to a single ShardingSphere-Proxy or a ShardingSphere-Proxy cluster by means of a load-balancing portal. However, there are some variations within the technical implementation and configuration of the completely different load balancers.

For instance, within the case of a direct connection to ShardingSphere-Proxy with no restrict on the whole time a database connection session may be held, some Elastic Load Balancing (ELB) merchandise have a most session maintain time of 60 minutes at Layer 4. If an idle database connection is closed by a load balancing timeout, however the shopper will not be conscious of the passive TCP connection closure, the applying might report an error.

Therefore, along with issues on the load balancing stage, you may contemplate measures for the shopper to keep away from the affect of introducing load balancing.

On-demand connection creation

If a connection’s occasion is created and used repeatedly, the database connection will probably be idle more often than not when executing a timed job with a one-hour interval and a brief execution time. When a shopper itself is unaware of modifications within the connection state, the lengthy idle time will increase the uncertainty of the connection state. For situations with lengthy execution intervals, contemplate creating connections on demand and releasing them after use.

Connection pooling

General database connection swimming pools have the flexibility to take care of legitimate connections, reject failed connections, and so forth. Managing database connections by means of connection swimming pools can scale back the price of sustaining connections your self.

Enable TCP KeepAlive

Clients usually help TCP KeepAlive configuration:

  • MySQL Connector/J helps autoReconnect or tcpKeepAlive, which aren’t enabled by default.
  • The PostgreSQL JDBC Driver helps tcpKeepAlive, which isn’t enabled by default.

Nevertheless, there are some limitations to how TCP KeepAlive may be enabled:

  • The shopper doesn’t essentially help the configuration of TCP KeepAlive or computerized reconnection.
  • The shopper doesn’t intend to make any code or configuration changes.
  • TCP KeepAlive relies on the working system implementation and configuration.

User case

Recently, a ShardingSphere neighborhood member offered suggestions that their ShardingSphere-Proxy cluster was offering companies to the general public with upper-layer load balancing. In the method, they discovered issues with the connection stability between their utility and ShardingSphere-Proxy.

Problem description

Assume the consumer’s manufacturing surroundings makes use of a three-node ShardingSphere-Proxy cluster serving purposes by means of a cloud vendor’s ELB.

(Wu Weijie, CC BY-SA 4.0)

One of the purposes is a resident course of that executes timed jobs, that are executed hourly and have database operations within the job logic. The consumer suggestions is that every time a timed job is triggered, an error is reported within the utility log:

ship of 115 bytes failed with errno=104 Connection reset by peer
Checking the ShardingSphere-Proxy logs, there are not any irregular messages.

The difficulty solely happens with timed jobs that execute hourly. All different purposes entry ShardingSphere-Proxy usually. As the job logic has a retry mechanism, the job executes efficiently after every retry with out impacting the unique enterprise.

Problem evaluation

The purpose why the applying reveals an error is evident—the shopper is sending information to a closed TCP connection. The troubleshooting purpose is to establish precisely why the TCP connection was closed.

If you encounter any of the three causes listed beneath, I like to recommend that you simply carry out a community packet seize on each the applying and the ShardingSphere-Proxy aspect inside a couple of minutes earlier than and after the purpose at which the issue happens:

  • The downside will recur on an hourly foundation.
  • The difficulty is community associated.
  • The difficulty doesn’t have an effect on the consumer’s real-time operations.

Packet seize phenomenon 1

ShardingSphere-Proxy receives a TCP connection institution request from the shopper each 15 seconds. The shopper, nevertheless, sends an RST to the proxy instantly after establishing the reference to three handshakes. The shopper sends an RST to the proxy with none response after receiving the Server Greeting and even earlier than the proxy has despatched the Server Greeting.

(Wu Weijie, CC BY-SA 4.0)

However, no site visitors matching the above conduct exists within the application-side packet seize outcomes.

By consulting the neighborhood member’s ELB documentation, I discovered that the above community interplay is how that ELB implements the four-layer well being examine mechanism. Therefore, this phenomenon will not be related to the issue on this case.

(Wu Weijie, CC BY-SA 4.0)

Packet seize phenomenon 2

The MySQL connection is established between the shopper and the ShardingSphere-Proxy, and the shopper sends an RST to the proxy throughout the TCP connection disconnection part.

(Wu Weijie, CC BY-SA 4.0)

The above packet seize outcomes reveal that the shopper first initiated the COM_QUIT command to ShardingSphere-Proxy. The shopper disconnected the MySQL connection primarily based on however not restricted to the next attainable situations:

  • The utility completed utilizing the MySQL connection and closed the database connection usually.
  • The utility’s database connection to ShardingSphere-Proxy is managed by a connection pool, which performs a launch operation for idle connections which have timed out or have exceeded their most lifetime. As the connection is actively closed on the applying aspect, it doesn’t theoretically have an effect on different enterprise operations except there’s a downside with the applying’s logic.

After a number of rounds of packet evaluation, no RSTs had been despatched to the shopper by the ShardingSphere-Proxy within the minutes earlier than and after the issue surfaced.

Based on the out there data, it is attainable that the connection between the shopper and ShardingSphere-Proxy was disconnected earlier, however the packet seize time was restricted and didn’t seize the second of disconnection.

Because the ShardingSphere-Proxy itself doesn’t have the logic to actively disconnect the shopper, the issue is being investigated at each the shopper and ELB ranges.

Client utility and ELB configuration examine

The consumer suggestions included the next extra data:

  • The utility’s timed jobs execute hourly, the applying doesn’t use a database connection pool, and a database connection is manually maintained and offered for ongoing use by the timed jobs.
  • The ELB is configured with 4 ranges of session maintain and a session idle timeout of 40 minutes.

Considering the frequency of execution of timed jobs, I like to recommend that customers modify the ELB session idle timeout to be larger than the execution interval of timed jobs. After the consumer modified the ELB timeout to 66 minutes, the connection reset downside now not occurred.

If the consumer had continued packet capturing throughout troubleshooting, it is seemingly they’d have discovered ELB site visitors that disconnects the TCP connection on the fortieth minute of every hour.

Problem conclusion

The shopper reported an error Connection reset by peer Root trigger.

The ELB idle timeout was lower than the timed process execution interval. The shopper was idle for longer than the ELB session maintain timeout, ensuing within the connection between the shopper and ShardingSphere-Proxy being disconnected by the ELB timeout.

The shopper despatched information to a TCP connection that had been closed by the ELB, ensuing within the error Connection reset by peer.

Timeout simulation experiment

I made a decision to conduct a easy experiment to confirm the shopper’s efficiency after a load-balancing session timeout. I carried out a packet seize throughout the experiment to research community site visitors and observe the conduct of load-balancing.

Build a load-balanced ShardingSphere-Proxy clustered surroundings

Theoretically, this text may cowl any four-tier load-balancing implementation. I chosen Nginx.

I set the TCP session idle timeout to at least one minute, as seen beneath:

consumer  nginx;
worker_processes  auto;

error_log  /var/log/nginx/error.log discover;
pid        /var/run/;

occasions {
    worker_connections  1024;

stream {
    upstream shardingsphere {
        hash $remote_addr constant;

        server proxy0:3307;
        server proxy1:3307;

    server {
        hear 3306;
        proxy_timeout 1m;
        proxy_pass shardingsphere;

Construct a Docker compose file

Here’s a Docker compose file:

model: "3.9"

    picture: nginx:1.22.0
      - 3306:3306
      - /path/to/nginx.conf:/and so on/nginx/nginx.conf

    picture: apache/shardingsphere-proxy:5.3.0
    hostname: proxy0
      - 3307

    picture: apache/shardingsphere-proxy:5.3.0
    hostname: proxy1
      - 3307

Startup surroundings

Start the containers:

 $ docker compose up -d 
[+] Running 4/4
 ⠿ Network lb_default     Created                                                                               0.0s
 ⠿ Container lb-proxy1-1  Started                                                                               0.5s
 ⠿ Container lb-proxy0-1  Started                                                                               0.6s
 ⠿ Container lb-nginx-1   Started

Simulation of client-side same-connection-based timed duties

First, assemble a client-side deferred SQL execution. Here, the ShardingSphere-Proxy is accessed by means of Java and MySQL Connector/J.

The logic:

  1. Establish a connection to the ShardingSphere-Proxy and execute a question to the proxy.
  2. Wait 55 seconds after which execute one other question to the proxy.
  3. Wait 65 seconds after which execute one other question to the proxy.
public static void most important(String[] args) {
    strive (Connection connection = DriverSupervisor.getConnection("jdbc:mysql://", "root", "root"); Statement assertion = connection.createStatement()) {
    } catch (Exception e) {
        log.error(e.getMessage(), e);

personal static String getProxyModel(Statement assertion) throws SQLException {
    strive (ResultSet resultSet = assertion.executeQuery("select version()")) {
        if (resultSet.subsequent()) {
            return resultSet.getString(1);
    throw new UnsupportedOperationException();

Expected and client-side run outcomes:

  1. A shopper connects to the ShardingSphere-Proxy, and the primary question is profitable.
  2. The shopper’s second question is profitable.
  3. The shopper’s third question leads to an error as a consequence of a damaged TCP connection as a result of the Nginx idle timeout is about to at least one minute.

The execution outcomes are as anticipated. Due to variations between the programming language and the database driver, the error messages behave in a different way, however the underlying trigger is similar: Both TCP connections have been disconnected.

The logs are proven beneath:

15:29:12.734 [main] INFO icu.wwj.whats up.jdbc.ConnectToLBProxy - 5.7.22-ShardingSphere-Proxy 5.1.1
15:30:07.745 [main] INFO icu.wwj.whats up.jdbc.ConnectToLBProxy - 5.7.22-ShardingSphere-Proxy 5.1.1
15:31:12.764 [main] ERROR icu.wwj.whats up.jdbc.ConnectToLBProxy - Communications hyperlink failure
The final packet efficiently obtained from the server was 65,016 milliseconds in the past. The final packet despatched efficiently to the server was 65,024 milliseconds in the past.
        at com.mysql.cj.jdbc.exceptions.SQLError.createCommunicationsException(
        at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(
        at com.mysql.cj.jdbc.StatementImpl.executeQuery(
        at icu.wwj.whats up.jdbc.ConnectToLBProxy.getProxyModel(
        at icu.wwj.whats up.jdbc.ConnectToLBProxy.most important(
Caused by: com.mysql.cj.exceptions.CJCommunicationsException: Communications hyperlink failure

The final packet efficiently obtained from the server was 65,016 milliseconds in the past. The final packet despatched efficiently to the server was 65,024 milliseconds in the past.
        at java.base/jdk.inner.mirror.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at java.base/jdk.inner.mirror.NativeConstructorAccessorImpl.newInstance(
        at java.base/jdk.inner.mirror.DelegatingConstructorAccessorImpl.newInstance(
        at java.base/java.lang.mirror.Constructor.newInstanceWithCaller(
        at java.base/java.lang.mirror.Constructor.newInstance(
        at com.mysql.cj.exceptions.ExceptionManufacturing facility.createException(ExceptionManufacturing
        at com.mysql.cj.exceptions.ExceptionManufacturing facility.createException(ExceptionManufacturing
        at com.mysql.cj.exceptions.ExceptionManufacturing facility.createException(ExceptionManufacturing
        at com.mysql.cj.exceptions.ExceptionManufacturing facility.createCommunicationsException(ExceptionManufacturing
        at com.mysql.cj.protocol.a.NativeProtocol.readMessage(
        at com.mysql.cj.protocol.a.NativeProtocol.checkErrorMessage(
        at com.mysql.cj.protocol.a.NativeProtocol.sendCommand(
        at com.mysql.cj.protocol.a.NativeProtocol.sendQueryPacket(
        at com.mysql.cj.protocol.a.NativeProtocol.sendQueryString(
        at com.mysql.cj.NativeSession.execSQL(
        at com.mysql.cj.jdbc.StatementImpl.executeQuery(
        ... 2 widespread frames omitted
Caused by: Can not learn response from server. Expected to learn 4 bytes, learn 0 bytes earlier than connection was unexpectedly misplaced.
        at com.mysql.cj.protocol.FullReadInputStream.readFully(
        at com.mysql.cj.protocol.a.EasyPacketReader.readHeaderLocal(
        at com.mysql.cj.protocol.a.EasyPacketReader.readHeader(
        at com.mysql.cj.protocol.a.EasyPacketReader.readHeader(
        at com.mysql.cj.protocol.a.TimeTrackingPacketReader.readHeader(
        at com.mysql.cj.protocol.a.TimeTrackingPacketReader.readHeader(
        at com.mysql.cj.protocol.a.MultiPacketReader.readHeader(
        at com.mysql.cj.protocol.a.MultiPacketReader.readHeader(
        at com.mysql.cj.protocol.a.NativeProtocol.readMessage(
        ... 8 widespread frames omitted

Packet seize outcomes evaluation

The packet seize outcomes present that after the connection idle timeout, Nginx concurrently disconnects from the shopper and the proxy over TCP. However, the shopper will not be conscious of this, so Nginx returns an RST after sending the command.

After the Nginx connection idle timeout, the TCP disconnection course of with the proxy completes usually. The proxy is unaware when the shopper sends subsequent requests utilizing the disconnected connection.

Analyze the next packet seize outcomes:

  • Numbers 1–44 are the interplay between the shopper and the ShardingSphere-Proxy to determine a MySQL connection.
  • Numbers 45–50 are the primary question carried out by the shopper.
  • Numbers 55–60 are the second question executed by the shopper 55 seconds after the primary question is executed.
  • Numbers 73–77 are the TCP connection disconnection processes initiated by Nginx to each the shopper and ShardingSphere-Proxy after the session occasions out.
  • Numbers 78–79 are the third question executed 65 seconds after the shopper executes the second question, together with the Connection Reset.

(Wu Weijie, CC BY-SA 4.0)

Wrap up

Troubleshooting disconnection points includes analyzing each the ShardingSphere-Proxy settings and the configurations enforced by the cloud service supplier’s ELB. It’s helpful to seize packets to know when specific occasions—particularly DST messages—happen in comparison with idle time and timeout settings.

The above implementation and troubleshooting state of affairs is predicated on a particular ShardingSphere-Proxy deployment. For a dialogue of cloud-based choices, see my followup article. ShardingSphere on Cloud presents extra administration choices and configurations for quite a lot of cloud service supplier environments.

This article is customized from A Distributed Database Load Balancing Architecture Based on ShardingSphere: Demo and User Case and is republished with permission.

Most Popular

To Top