-
Notifications
You must be signed in to change notification settings - Fork 6
System Architecture
The system consists of two different layers: first, we have the broker server that handles all direct requests from end users or API calls alike. Second, we have a layer of so-called worker servers, each implementing some sort of machine translation functionality. All communication between users and workers is channeled through the broker server which acts as a central “proxy” server. For users, both broker and workers “constitute” the MT Server Land application.
Human users connect to the system using any modern web browser, API access can be implemented using XML-RPC calls. It would be relatively easy to extend the API interface to support other protocols such as SOAP or REST. By design, all internal method calls that connect to the worker layer have to be implemented with XML-RPC. In order to prevent encoding problems with the input text, we send and receive all data encoded as Base64 Strings between broker and workers; the broker server takes care of the necessary conversion steps.
The broker server has been implemented using the django web framework which takes care of low-level tasks and allows for rapid development and clean design of components. We have used the framework for other project work before and think it is well suited to the task. More information on django can be found on the project website which is available at http://www.djangoproject.com/, the framework itself is available under an open-source BSD-license.
The broker server implements two main object django models which we describe below. Please note that we have also developed additional object models, e.g. for quota management. See the source code for more information.
-
WorkerServer
stores all information related to a remote worker server. This includes source and target language, the respective hostname and port address as well as a name and a short description. -
TranslationRequest
models a translation job and related information such as the chosen worker server, the source text and the assigned request id. Furthermore we store additional information about the creation date, the owner and also prepare some fields for caching of translation result and state.
We developed a browser-based web interface to access and use the MT Server Land application. End users first have to authenticate before they can access their dashboard which lists all known translation requests for the current user and also allows to create new requests. Once a translation request has been completed by the chosen worker server, the result is transferred to (and also cached by) the object instance inside the broker server’s data storage. The user can view the result within the dashboard or download the file to a local hard disk. Translation requests can be deleted at any time, effectively terminating the corresponding thread within the connected worker server.
In parallel to the browser interface, we have designed and implemented an API that allows to connect applications to the MT functionality provided by our application using XML-RPC. Again, we first require authentication before any machine translation can be used. We provide methods to list all requests for the current “user” (i.e. the application account) and to create, download, or delete translation requests. Extension to REST or SOAP protocols is possible.
We assume the broker server code has been extracted to /BrokerServer/
. Like any other django project, the broker server can be started in debug mode using the python manage.py runserver
command. For deployment of the system, we internally have used the lighttpd web server which is a lightweight, fast and open-source web server that can be easily combined with a django application. More information can be found on the project website which is available at http://www.lighttpd.net/. We have configured the web server to serve all django media files and send all other requests to the django FCGI server that runs in a background process. A sample server configuration file lighttpd-django.conf
and startup/stop scripts for django’s FCGI mode are contained in the source code release package.
Actual machine translation functionality is implemented by a layer of so-called worker servers that are connected to the central broker server. We have created a Python-based AbstractWorkerServer
class which is the foundation for all worker implementations. The basic worker interface is described below:
-
finished
: Boolean that controls the main server loop. Defaults toFalse
. -
server
: The actualSimpleXMLRPCServer
instance is bound here. -
jobs
: – jobs: Dictionary memorizing all translation requests the worker has accepted. Maps request ids as keys to Process objects that represent the actual worker threads. Request ids are random 32-digit hexadecimal UUID numbers. -
language_pairs
: List containing tuples that encode the available language pairs. (This feature is currently not used but might play a role in a future release.)
-
__init__
: Constructor, takes care of setting up the logging and the actual XML-RPC server instance. -
start_worker
: Starts the main server loop that handles requests. -
stop_worker
: Setsfinished
toTrue
and terminates all running translation processes. Intermediate results are lost, the file storage of the worker server should be cleaned afterwards to avoid keeping invalid requests.
-
list_requests
: Returns a list of all registered translation request ids. -
is_alive
: ReturnsTrue
to signal that the worker server is up and running. -
is_busy
: Checks whether the worker server is currently processing requests. -
is_ready
: Checks whether the request with the given request id is finished. -
is_valid
: Checks whether the request id is valid, i.e. contained withinjobs
.
-
start_translation
: Creates a request id by computing a random UUID (UUID-4) and encoding that as 32-digit hex String. Once the request id has been created, the source text is copied to a local file inside the worker server’s storage and a Process that calls thehandle_translation
handler is started. -
fetch_translation
: Retrieves the translation result for the given request id if already available. Otherwise returns an empty String. -
delete_translation
: Deletes the translation request with the given request id from the jobs dictionary, terminating the connected process if still running. -
handle_translation
: Implements the actual translation functionality of a worker implementation. Custom worker serves need to overwrite this method.
Worker servers can be implemented by subclassing AbstractWorkerServer
and creating a custom handle_translation
method. The following listing shows the actual code for a “Google worker” server that sends its input text to Google Translate and extracts the translation from the resulting website. Please note that the Google worker is configured to translate from German→English, this could however be parameterized.
""" Implementation of a worker server that connects to Google Translate.
Currently translates from German->English only. """ import re import sys import urllib import urllib2
from worker import AbstractWorkerServer
class GoogleWorkerServer(AbstractWorkerServer): """ Implementation of a worker server that connects to Google Translate. """ __name__ = 'GoogleWorkerServer'
def handle_translation(self, request_id): """ Dummy translation handler that blocks for a random amount of time.
Returns all-uppercase version of Text as translation. """ source = open('/tmp/{0}.source'.format(request_id), 'r') text = source.read() source.close()
opener = urllib2.build_opener(urllib2.HTTPHandler)
the_url = 'http://translate.google.com/translate_t' the_data = urllib.urlencode({'js': 'n', 'text': text, 'sl': 'de', 'tl': 'en'}) the_header = {'User-agent': 'Mozilla/5.0'}
request = urllib2.Request(the_url, the_data, the_header) handle = opener.open(request) content = handle.read() handle.close() #raw_result = unicode(content, 'utf-8')
result_exp = re.compile('<textarea name=utrans wrap=SOFT ' \ 'dir="ltr" id=suggestion.*>(.*?)</textarea>', re.I|re.U)
result = result_exp.search(content)
if result: target = open('/tmp/{0}.target'.format(request_id), 'w') target.write(result.group(1)) target.close()