Saturday, June 15, 2013

Drupal behind heavy doors

We just upgraded the largest daily newspaper in Israel, "Israel Hayom" (www.israelhayom.co.il), to the Drupal platform.
Drupal is a great CMS which suits the data structure very well and allows us, the developers, to hand off much of the controls to the editors and site administrators, with lots of very cool features and great user experience. On the contrary, when it's comes to delivering pages and content to the end-users, Drupal may not be the best solution. Direct access to Drupal can result in a very weak performance while every content request can become an obstacle.


Our team approach was to put Drupal “Behind heavy doors”. Which means, no content request will be granted a direct access to Drupal, but ours. We’ve divided our solutions into two parts: one solution for the static content and one solution for the interactive, dynamic content.


Most of the news content on the website today is presented as static, and the best solution for delivering static content is done by caching. All the static content as articles, opinions, etc. is being cached as HTML pages. Our solution for improving static web pages performance with Drupal is divided into two layers:
 The first layer which the end-users come across is an Akamai CDN. Akamai is a few thousand servers spread globally and caches HTML pages by URL path, each page is assigned a high TTL (Time To Live) of one week. This process requires a lot of caching but if a new page is delivered or the cache has cleared, the request will pass on to the next layer of the solution.
 The second layer is a Ngnix used as a reverse proxy server which saves all the static pages as HTML and never clears the cache. Each submission (Insert or update) of any content on the website by an editor will automatically be “pushed" to the server. The caching action is reduced by only occurring on content submissions and not on content requests. This is what makes the Drupal server impermeable (almost :-)) for end-users requests.
To find out how we solved immediate content updates on Akamai, please click here.


The website is made out of a variety of both Static and Dynamic pages.
Any list of content used in combination with user selection options, as filtering, sorting, search module or even just using a pager, will automatically become an interactive page with dynamic content, which does not use cache system.
To resolve the fact that we cannot cache those pages as HTML, we built two corresponding systems:
 The first one is a NodeJS / MongoDB server, which synchronizes all the content. On each submission we store a content-object with the relevant data on the MongoDB, ready to be retrieved at any time. On the other side we structure a data API for querying the data-objects, by passing fields, filters, sorts, etc. (This API is also serves all of our external providers like mobile devices, smartTV, etc.). At this point, we turned all the dynamic pages in the website (These are usually built with Views module), to a client-side JS rendering mechanism with AJAX and a template engine (aka Backbone.js + Mustache) that interact with the end-users, and dynamically querying the API which has super fast response time. (www.israelhayom.co.il/archive/articles).
 The second system is an Apachesolr server which holds similar objects and ready to query any searches initiated by the end-users, from the website, mobile device or smartTV.


As you can see, direct access for users request is not occurring to the Drupal server, the routing system or database. All of the complex processes are fully controlled, maintained and performed at the background while they are out of reach or Behind heavy doors”.