Using Apache Thrift for Python & C++

Apache Thrift is a software framework for cross-language: providing what is essentially a remote-procedure call interface to enable a client application to access services from a service — which can be written in the-same, or another language. Thrift supports all of the major languages that you’d expect to use: including Python, C++, Java, JavaScript / Node.js, and Ruby.

Unfortunately, whilst there are quite a few tutorials on how to use Thrift: some of them concentrate on explaining how Thrift is working behind the scenes (which is important, of course): rather than on how to use it. There also aren’t that many that concentrate on using C++. So having spent some time working through some of the tutorials (and the book Learning Apache Thrift), I though I’d have a go at writing something of a practical guide. I’m quite deliberately not going to focus on how Thrift works: if you want that, let me suggest the Apache Thrift – Quick Tutorial. I’m also going to assume that you have Thrift installed on your computer already (again, there are lots of sets of instruction on how to do this — Google will be your friend), and that you’re using a Linux or MacOS computer. You should be able to follow-along with most of this if you’re using Windows: but there will be a few differences, especially when it comes to compiling C++ files.

The Service Description File

The first thing that you need to do when using Thrift, is to write an description of the service that you want to create — using the Thrift Interface Definition Language (IDL).

The formal of the file is pretty simple. At its simplest it can contain just four-lines and still do something useful.

service Logger
{
    oneway void timestamp (1: string filename)
}

As you can see, this defines a service called logger — which has one service named timestamp, which takes a single argument (of a string type), called filename; and which doesn’t return anything. The keyword oneway in the definition means that the code generated by Thrift will result in a function that won’t wait for the service before continuing.

Thrift has a few datatypes to represent different data types in the supported languages.

Type Detail
bool Boolean
byte An 8-bit signed integer
i16 A 16-bit signed integer
i32 A 32-bit signed integer
i64 A 64-bit signed integer
double A 64-bit floating-point value
string A string

Note that there are no unsigned data types in Thrift…

Thrift also supports the definition of structs, and also supports lists, sets, and dictionaries.

Let’s look at a more complete example .thrift file.

namespace cpp Logger
namespace py LoggerPy

service Logger
{
    oneway void timestamp (1: string filename)
    string get_last_log_entry (1: string filename)
    void write_log (1: string filename, 2: string message)
    i32 get_log_size (1: string filename)
}

This file also introduces another concept — Thrift Namespaces; these are optional; but when included they let you specify the namespace to be used for the Thrift generated code. There are language specific (in the example py relates to Python, and cpp relates to C++).

As in the previous example, you can have multiple-services in one file — but that makes the generated code even more complex (as each service is implemented distinct); so for simplicity, I’d suggest only defining single service in a file.

There’s one last thing we can add to the file before we run the Thrift generator. In this example, given that we’re dealing with file I/O, we might have situations where that I/O causes errors. Typically we’d handle such errors by the use of exceptions. Thrift let’s us define exceptions to be used in conjunction with Thrift code.

So here’s the full version of the LoggerService.thrift file. (Note that the name of the file will be used in the names of some of the resulting files; so whilst you can name the file however you like, I’d recommend naming it something sensibly related to the service that it defines).

namespace py LoggerPy
namespace cpp LoggerCpp

exception LoggerException
{
    1: i32 error_code,
    2: string error_description
}

service Logger
{
    oneway void timestamp (1: string filename)
    string get_last_log_entry (1: string filename) throws (1: LoggerException error)
    void write_log (1: string filename, 2: string message) throws (1: LoggerException error)
    i32 get_log_size (1: string filename) throws (1: LoggerException error)
}

This is pretty much the same as the previous example — but you can now see more clearly why we’d want to use oneway for some void services; but not for others. Since by definition the calling client won’t wait if we specify the service as oneway we can’t throw an exception back to the client. So in this example we’ll have to handle the error silently within the function that defines our service.

Generating Code

Now that we have the IDL specification complete for our service, we can invoke Thrift and tell it to build some code in the languages that we specify. In this example, I’m going to start by defining the service in C++, and have that service called by a Python client.

To run Thrift simply run: thrift --gen py --gen cpp LoggerService.thrift

Each language generator specified will result in a directory named gen-xxx

(where xxx is the Thrift shorthand for the language — i.e. cpp, py, etc.) in the folder that Thrift was run from.

C++

For C++ Thirft actually helps us out quite a lot — by building a dummy version of the service definition: which will be named (in this case) Logger_server.skeleton.cpp.

It also generates a pair of .cpp & .h files named for the Service (here Logger.cpp and Logger.h — which highlights a good reason not to include the word service in the name of your service), and two pairs of files named Logger_Service_types and Logger_Service_constants (taking their name from the name of the .thrift file used for the generation).

These files (as with pretty much all machine generated code) are pretty heavy-going to try to read; but you don’t really need to do anything other than include them in the compilation stage…

The one exception to that, is the server. Recommended practice is to make a copy of the …skeleton.cpp file, and to build out from there.

The skeleton is a pretty short file — and you should be able to see very easily which bits you need to change to actually make your service do something useful. If we don’t make any changes to the server it should still run, but obviously won’t do anything especially useful (apart from echoing the name of the service method to the server process’s stdout).

For example, for here’s the code that the server will use to write a message line to the log.

void write_log(const std::string& filename, const std::string& message)
{
    std::fstream file;
    file.open(filename.c_str(), std::fstream::out | std::fstream::app);
    if (file.is_open())
    {
        file << message << std::endl;
        file.close();
    }
    else
    {
        LoggerException err;
        err.error_code = 1;
        err.error_description = std::string("Could not open file") + filename;
        throw err;
    }
}

A slightly more complex example is the method to return the last line from the log file.

...

void get_last_log_entry(std::string& _return, const std::string& filename)
{
    std::fstream file;

    file.open(filename.c_str(), std::fstream::in);
    if (file.is_open())
    {
        std::string line;
        std::string lastline;
        while(std::getline(file, line))
            lastline = line;
        _return = lastline;
        file.close();
    }

    else
    {
        LoggerException err;
        err.error_code = 1;
        err.error_description = std::string("Could not open file ") + filename;
        throw err;
    }
}

Note that when using a string as the return value in C++ we can’t use the normal C++ function return; but rather the generated C++ function returns void, and has an extra parameter named _return. This has been added by Thrift in code-generation, and is a pass-by-reference parameter that we set the “return” value to before we exit the method definition.

The service is defined as a C++ class — so we can write a properly object-oriented solution if we want to. For this simple example I’m not going to do that: so turning this into an version following good OO practice, where we persistently identify the log file name (for example) is left as a exercise for the reader… 🙂

Anyway, the complete code for the server is as follows.

// LoggerServer.cpp

#include "Logger.h"
#include <thrift/protocol/TBinaryProtocol.h>
#include <thrift/server/TSimpleServer.h>
#include <thrift/transport/TServerSocket.h>
#include <thrift/transport/TBufferTransports.h>

#include <chrono>
#include <cstdio>
#include <fstream>
#include <string>
#include <iostream>

using namespace ::apache::thrift;
using namespace ::apache::thrift::protocol;
using namespace ::apache::thrift::transport;
using namespace ::apache::thrift::server;

using boost::shared_ptr;

using namespace ::LoggerCpp;

class LoggerHandler : virtual public LoggerIf {
    public:
        LoggerHandler() 
        {
            // Your initialization goes here
        }

        void timestamp(const std::string& filename)
        {
            std::fstream file;
            file.open(filename.c_str(), std::fstream::out | std::fstream::app);
            if (file.is_open())
            {
                std::time_t now;
                now = std::chrono::system_clock::to_time_t(std::chrono::system_clock::now());
                file << std::ctime(&now);
                file.close();
            }
            else
            {
                LoggerException err;
                err.error_code = 1;
                err.error_description = std::string("Could not open file") + filename;
                throw err;
            }
        }

        void get_last_log_entry(std::string& _return, const std::string& filename)
        {
            std::fstream file;

            file.open(filename.c_str(), std::fstream::in);
            if (file.is_open())
            {
                std::string line;
                std::string lastline;
                while(std::getline(file, line))
                    lastline = line;
                _return = lastline;
                file.close();
             }

             else
             {
                 LoggerException err;
                 err.error_code = 1;
                 err.error_description = std::string("Could not open file ") + filename;
                 throw err;
             }
         }

         void write_log(const std::string& filename, const std::string& message)
         {
             std::fstream file;
             file.open(filename.c_str(), std::fstream::out | std::fstream::app);
             if (file.is_open())
             {
                 file << message << std::endl;
                 file.close();
             }
             else
             {
                 LoggerException err;
                 err.error_code = 1;
                 err.error_description = std::string("Could not open file ") + filename;
                 throw err;
             }
         }

         int32_t get_log_size(const std::string& filename)
         {
             int fs=0;
             std::fstream file;
             file.open(filename.c_str(), std::fstream::in | std::fstream::ate);
             if (file.is_open())
             {
                 fs = file.tellg();
                 file.close();
             }

             else
             {
                 LoggerException err;
                 err.error_code = 1;
                 err.error_description = std::string("Could not open file ") + filename;
                 throw err;
             }
             return fs;
        }

};

int main(int argc, char **argv) 
{
    int port = 9090;
    shared_ptr<LoggerHandler> handler(new LoggerHandler());
    shared_ptr<TProcessor> processor(new LoggerProcessor(handler));
    shared_ptr<TServerTransport> serverTransport(new TServerSocket(port));
    shared_ptr<TTransportFactory> transportFactory(new TBufferedTransportFactory());
    shared_ptr<TProtocolFactory> protocolFactory(new TBinaryProtocolFactory());

    TSimpleServer server(processor, serverTransport, transportFactory, protocolFactory);
    server.serve();
    return 0;
}

Having added our (example) code to the server file — all we need to do now is to build it.

Given that you need to compile in quite a few support files, I thoroughly recommend using CMake to write the build scripts for you.

Here’s my CMakeLists.txt file for this project.

cmake_minimum_required(VERSION 3.5)

project(LoggerService)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++11 -Wall -DHAVE_INTTYPES_H -DHAVE_NETINET_IN_H")

set(THRIFT_DIR "/usr/local/include/thrift")
set(BOOST_DIR "/usr/local/Cellar/boost/1.60.0_2/include/")

include_directories(${THRIFT_DIR} ${BOOST_DIR} ${CMAKE_SOURCE_DIR})
link_directories(/usr/local/lib)

set(BASE_SOURCE_FILES Logger.cpp Logger_Service_types.cpp Logger_Service_constants.cpp)
set(SERVER_FILES LoggerServer.cpp)

add_executable(LoggerServer ${SERVER_FILES} ${BASE_SOURCE_FILES})
target_link_libraries(LoggerServer thrift)

It’s hopefully self-explanatory, though you will need to change (if necessary) the paths to the install for Thrift, and Boost (a dependancy for Thrift), and substitute your filenames for the code files.

Python

Having built the server in C++, we can now turn our attention to Python, where we’ll make the client application.

Unfortunately Thrift doesn’t give us quite the same skeleton to work from; but the file itself is pretty simple, so we can easily write it from scratch.

 

import sys
sys.path.append('gen-py')

from LoggerPy import Logger

from thrift import Thrift
from thrift.transport import TSocket
from thrift.transport import TTransport
from thrift.protocol import TBinaryProtocol

try:
    transport = TSocket.TSocket('localhost', 9090)
    transport = TTransport.TBufferedTransport(transport)
    protocol = TBinaryProtocol.TBinaryProtocol(transport)

    client = Logger.Client(protocol)
    transport.open()

    logfile="logfile.log"

    client.timestamp(logfile)
    print ("Logged timestamp to log file")

    client.write_log(logfile, "This is a message that I am writing to the log")
    print ("Last line of log file is: %s" % (client.get_last_log_entry(logfile)))
    print ("Size of log file is: %d bytes" % client.get_log_size(logfile))

    transport.close()

except TTransport.TTransportException:
    print ("Error starting client")

except Thrift.TException, e:
    print ("Error: %d %s" % (e.error_code, e.error_description))

As you can see there’s not a lot to do there. The first 14-lines are essentially boiler-plate code, and after that all we need to do is create our client, and call the methods we wish to invoke from the server; handling any exceptions we generate from the server (except Thrift.TException…), and the case where the transport fails to open (which is usually caused by the server not being running).

In part two, we’ll turn this the other way around — and see how to to call a Python server from C++.

2 thoughts on “Using Apache Thrift for Python & C++

  1. As a quick follow-up I should add that the ‘throws’ clause for a given function in the .thrift file has a curious effect (at least in Python). Your server can still throw an exception – but it’s not of your user-defined type: but rather a generic TApplicationException. Troubleshooting this can be quite tricky until you spot the error!

    Not that I did this, of course… 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *