My Interview With Microsoft Bing – Search Quality Team

Lately I have been into competitive programming (actually that is not something new but it is the first time that I am committed) and working on some of the algorithm problems in sites like Topcoder, Spoj, InterviewStreet, CareerCup etc. So first thing you can do after doing some practices is to test yourself on real interview with big shots. As I remember right I have been contacted by some recruiter 1-2 month ago for a US – Redmond based position on Microsoft Bing – Search Quality Team  (which I understood is aggressively expanding right now). So they setup a phone interview last week with one of the engineers from Microsoft Research.

This interview was different from the regular interview, you generally face with some HR member on the phone but this was with technical person. This means that they already know, who you are and what you did but wanna check if you can reach the hiring bar to invite to onsite interview. If you get HR person they will probably test you with some basic questions CS questions and test cases like “Your are from XBOX team. Before you launch the product how do you test it?” stuff.

Interview last about one and a half hour (which is not the case with HR – only last 45 minutes).

First part was about little chitchat about my previous works and kind of prepare you for the next steps.

  • Detailed design talks about the project that you have done.
  • What are the hardest project that you have done?
  • Why is it that hard?
  • What do you do in your free time? Any side projects ?

Second part was about theoritical stuff about OS and design. Here are some of the questions I got as I remember

  • What is process and thread ? What are the differences between them ? (In a detailed way)
  • What is critical section ? Why is it called critical section? How do you detect critical section in code?
  • What is dead-lock? How do you prevent deadlock? What are some of the mechanisms that you aware of? Give some examples
  • What are the differences between monitors and locks ?
  • And it goes on like that by covering my operating system design textbook…

Last part was about coding in collaborative enviroment (which I am interested actually). They will let you use whichever language you are comfortable with. (Acutally I will give an advice in here – you should use c/c++ | I will explain this in later post). But because the interviewer was not prepared for interview this part actually was the shortest one. It last about 15 minutes. Here are some of the questions I remember

  • You have # gb’s of string data in a file. How do you sort it and search it? (Hint: Dont forget it does not fit into memory.)
  • Talk about some sort algorithms (skip some O(n^2) ones quickly :) they want to hear some O(nlogn) ones like quicksort or mergesort).
  • Show how quicksort works? Does it inplace or not?
  • Same for the mergesort
  • Which data structure you use if you wanna do mergesort inplace ? (hint : linkedlist)
  • You have some sort of string like. Input: “Microsoft is hiring for bing” – Output: “bing for hiring is Microsoft”
  • Lets say you do it in a recursive way. Now do it without using extra memory. (If you are using language like Java/C# dont forget strings are immutable. character array would be good choice)

Hope it helps…

 

Erlang Getting Started : Data Types – I

I have written previous blog post about Erlang and Concurrency Oriented Programming (COP) this is a follow post about Erlang data types and getting started with shell.

Installing Erlang

I prefer installing Erlang from source but you also have option using binary distributions which is available only for Windows and Linux.

Windows

  • You can download releases at http://www.erlang.org/download.html
  • Prefer using latest stable version which is now at “R15B03″
  • Rest is standart windows installation just follow the instructions.
  • One important thing is to add path of the erlang installation to PATH variables in Windows enviroment variables.

Linux

  • This is the best probably easiest to work with.
  • All you have to type this. “apt-get install erlang”
  • And package manager do the rest for you.

Mac

This is the worst part. Erlang does not have binary distribution for Mac.
So you have two option either installing from the source or using something like MacPorts.
I prefer installing from the source. Here is the steps to do it.

Feth the erlang source from erlang.org

  • tar -xzf otp_src_R15B03.tar.gz (change according to your version)
  • cd otp_src_R15B03
  • ./configure
  • make
  • sudo make install

Thats all you need.
Dealing with shell

You can test pretty much everything with erlang shell as a beginner.
Just type erl and start testing out.

~$ erl
Eshell V5.8.5 (abort with ^G)
1> hello.
hello
2>

erl is a command to start erlang shell
# > is a command line number for shell
use Ctrl+P and Ctrl+N to navigate between previous shell commands
use q(). or Ctrl+g + q
Integers

Erlang uses arbitrary sized integers so you wont get problems like integer overflow or similar issues.
Lets try some examples

Eshell V5.8.5 (abort with ^G)
1> 2323235235234234 * 56867856747654674.
132117408548404227806668534909716
2>

Ps : Also you can enter integers in different formats like base16 or base32

Variables

Ok this is where it gets quirky. Erlang have variables like in other languages but they do not actually vary much. Variables start their life as unbound and move to bound state as they get their first assigment. Meaning you have single shot to assign some value. If you try to assign value to variable which is alreay bound you will get crazy complains from erlang compiler probably about “bad match”. It may seem odd but this is actually a good thing if you are dealing with lots of concurrent operation. If you are able to change variable value in more than one place its hard to debug which value of the variable cause the error. But this way you will sure know that which value fo the variable cause the error.erl

Eshell V5.8.5 (abort with ^G)
1> X = 11.
11
2> Y = 21.
21
3> X = Y.
** exception error: no match of right hand side value 21
4> X = 13.
** exception error: no match of right hand side value 13

So variables are not really variables at least not in the sense of how they used in languages like C,C++,Java. And here is another shocking news = is not an assignment operator. Again not in the sense of Erlang, it is actually pattern matching operator. Pattern matching is not meaningful in this context right now but it will be more meaningful when we get to lists and tuples. Here is what happens when you assign 32 (actually match) value to variable Z. Before anything happened Z is just a black hole waiting to be filled but after assignment with value 32 its never changes. Actually this is like what math works.

Scope of Variables

Erlang does not have something global,private,protected variable. If you define same variable at different places in the program all of the becomes bind to that lexical unit. In this case if its a function variable you have declared will be local to that function, you can not access the same variable outside of that function and if you access it will probably be a different X defined at another lexical unit.

Pattern Matching

Ok lets look at what pattern matching really is. Pattern matching denoted by “=” operator at Erlang.

Left Hand Side = Right Hand Side

This statement means that take the value of the right hand side and then match the result agains the left hand side. When you first assign value to variable like Z = 35. Erlang takes the right hand side, cause Z is unbound it just takes the 35 value and attach it to Z by matching. (there is no rocket science matching here just take value and match to variable) If we say Z = 24. after first statement Erlang takes 24 and look at the left hand side found out that Z is bound variable and can not operate pattern matching this time. Second statement will be only true if the right hand side value is equal the previous value of the Z in this case it is 35. That pretty much pattern matching works in Erlang.

Eshell V5.8.5 (abort with ^G)
1> Z = 35.
35
2> Z = 24.
** exception error: no match of right hand side value 24
3> Z = (25 + 10).
35

Floating-Point Numbers
Floating point numbers is no different from regular integer arithmetic. Here are some examples.

Eshell V5.8.5 (abort with ^G)
1> 5 / 3.
1.6666666666666667
2> 34 / 44.
0.7727272727272727
3> 34 rem 4
3> .
2
4> 5 div 3
4> .
1

/ operator always return floating point result it does not make any difference whether arguments of statement regular integer, conversion done implicitly. But “div” and “rem” operators are different they always return integer result. Rem is a regular reminder operator return the reminder of the operation and div is regular divison operator divide first operand to second one.

Atoms

Atoms are another type that I have never heard of in any language I have used. They represent non-numerical constant values. I find them very close to C macro constant definitions or enumerated types in Java. Atoms are global and you dont have to include any file or definition to achieve that. Lets look at an example where atoms fit the case. You have definiton for months in a year. What would be the best type to define them in Erlang case it is atoms. JAN, FEB etc. Atoms generally starts with lower-case letter and can include any alphanumeric character. blue, John, john@gmail.com…

Atoms also can be used with single quote. You may think that you can confuse atoms with strings but no Erlang does not allow you type strings with single quote so its not a problem. Using with single quote you can actually create upper-case atoms. ‘Monday’, ‘Tuesday’, ‘this is spaced atom’…

The important question what is the value of the atom here? The answer is atom itself. Remember every expression must have value.

Eshell V5.8.5 (abort with ^G)
1> hello.
hello
2> ‘This is spaced atom’.
‘This is spaced atom’
3> ‘Upper case one’.
‘Upper case one’
4> burakdede87@gmail.com.
‘burakdede87@gmail.com’

Tuples

Tuples are similar to tuples type in Python when you need to combine fixed number of items into one. You need to use curly brackets for tuples. Say you have user information you can store them in tuples in this way.

Eshell V5.8.5 (abort with ^G)
1> Person = {1,{name,burak}, {lastname, dede}, {twitter, burakdede}}}.
{1,{name,burak},{lastname,dede},{twitter,burakdedee}}
2> OtherPerson = {person,{name,burak}, {lastname, dede}, {twitter, burakdede}}.
{person,{name,burak},{lastname,dede},{twitter,burakdede}}

You can easily represent person information inside tuple, actually tuples. One thing important is that you have to document what is tuple doing with atoms as in the examples. It is hard to determine what tuple doing just by looking at the items inside it. name, lastname, twitter are just some of them. This is not a must but its a common usage inside erlang.

Tuples are created when you define them and destroyed when no longer need. (yes by garbage collector) Erlang have garbage collector that do all the memory stuff and reclaim all unnecessary memory. When you define tuple from another tuple, it is referenced by the new tuple.

Eshell V5.8.5 (abort with ^G)
1> N = {name, burak}.
{name,burak}
2> L = {lastname, dede}.
{lastname,dede}
3> P = {person, N, L}.
{person,{name,burak},{lastname,dede}}
4> P = {person2, N, L2}.
* 1: variable ‘L2′ is unbound

As you see you get error when you use unbound variable along with tuple definition. Shell saying that this variable does not have value so I can not use it to match left hand side.
Tuple creation is easy how about extracting values from tuple. Here is how.

6> Person = {person, burak, dede}.
{person,burak,dede}
7> {person, X, Y} = Person.
{person,burak,dede}
8> X.
burak
9> Y.
dede

We created a person tuple with person, burak and dede atoms which kind of document object itself. You have to match Person to another tuple in order to extract values from it. When you assign right hand side Person to new {person, X, Y} tuple X and Y both assigned to burak and dede particularly. It is really pattern matching rather than assignment actually. You can try more complex examples yourself.

I think this post is long enough to get the first idea of erlang types (which is not complete, you need lists and strings).

Sorry for typos.

Erlang – Concurrency Oriented Programming

Functional World

Recently I have been looking for a functional programming language to add to my current skillset and in a small amount of time I chose Erlang from list of languages like Scala, Haskell, F#, OCaml. I am gonna explain what the language trying to solve and what is philosophy behind it.

Little Background

Erlang is a language that is emerged from telecom industry (which is not a very common thing and seem interesting to me). The problem Erlang trying to solve is “Making reliable distributed systems in the presence of software errors”. That is the exact line I got from Joe Armstrong thesis. (by the way thesis is very good way to start learning Erlang and how it evolved over the time to presence). It first started as a research program inside Ericsson for telecom applications and some couple of small businesses but its explotion started with the Ericssons’ open sourcing the language effort and the libraries called OTP(open telecom platform). First POC of the Erlang was a dialect of prolog then it had its own syntax and vm. I am gonna post Erlang Movie here if you wanna watch first demo of the system at Ericsson.

Philosophy Behind

Erlang is concurrency oriented langauge and deal with concurreny at language level threating them as first class citizen. This is not a common thing in imprative langauges like Java or C++ which relies on underlying operating system thread model. So yes you can deal with concurrency in Erlang at language level and with minimum cost. Software in real life has different part that is happening simulataneously, its not easy task to do this in imperative langauges with side effects. Actually this is exactly why I decide to learn functional programming language cause doing these kind of concurrency related task is very hard to do right in languages like Java, C++. In imperative languages one of the most important thing you are dealing is “side effects”. This is very hard thing to trace if you have error in your system. There are so much thing and parts changing the same thing its is very hard to debug and spot the problematic part easily. Also concurrent code in imperative languges is really tricky business and took a lot of your time to do right.

Other part is reliabilty of the system. In imperative languages one error in the one part of a system may take down your hole applications. This does not have to be this way but it is. Independent parts of the system act as it is and if one of them fails, it should fail fast by not effecting others. (fail-fast) Also this does not mean that independent parts of the system have to be in the same physical hardware, it can be another node which is failing.

Erlang Problem Domain

  • The system must be able to handle very large numbers of concurrent activities.
  • Actions must be performed at a certain point in time or within a certain time.
  • Systems may be distributed over several computers
  • The system is used to control hardware
  • The sodware systems are very large.
  • The system exhibits complex functionality such as, feature interaction.
  • The systems should be in continuous operation for many years
  • Sodware maintenance (reconfiguration, etc) should be performed without stopping the system.
  • There are stringent quality, and reliability requirements
  • Fault tolerance both to hardware failures, and sodware errors, must be provided.

Most of the functional languages take concurrency very seriously and act as they are part first class citizen. Say you are driving a car and you are aware that there are also other people driving car at the same time. You can not just start driving car as you are the only one person in the world driving car at that time. Starting your journey this way emerge a lot of trouble in next phases. World is concurrent so the software that mimic it. Here are couple of rules that make clear how Erlang model works.

Erlang Rules

  1. Everything is a process and we can divide some parts of the system into independent processes.
  2. All of these processes are isolated from each other. (No shared memory or another mechanism)
  3. There is one and only way for to process to communicate. It is message passing.
  4. No data shared among processes.
  5. Process have to work in fail-fast approach if error occurs.
  6. Process must have unique names. (this also helps with the message passing between processes)
  7. If you know the unique name of the process you can send a message to it.
  8. Process creation and destruction is lightweight process.

Simple Hello World
Ok now we know what Erlang is and what problem it is solving. Lets make canonical hello world app and finish this post.

-module(hello).
-export([hello_world/0]).

hello_world() -> io:fwrite(“hello, world\n”).

Erlang programs start with module definitions just like in the hello worlds program (-modeul(hello))
Then it followed by import/export (-export([hello_world/0])) Export syntax is telling that these are the function definitions that can be called outside of this module. In this case it is hello_world function with zero argument.
Then there is a definition for hello_world() function followed by arrow (->) which means “this is where function signutare ends, lets start implementation of it”.
io:fwrite (just like System.out.println or printf) is telling erlang to print the given string to default output with new line at the end.

Sorry for any typos.

Stop, That is not the Right Tool

I am trying to shape an idea for a while to start working on it. Application will heavily use realtime communication and synchronization between clients. First I started out defining needs of the application and what tool I have in my hands right now to make it working. I have knowladge in c/c++ but its been for a while that I did not write any production code with any of them, hence they do not provide anything for web development unless you need to rewrite some parts of the system for performance. So c family is out. Other candidates are java, python, ruby actually these are the languages, we should be talking about frameworks like django, tornado, rails, spring, play etc. (I do not profiecient on all of these frameworks but if one them seems to match the needs I would probably end up learning it). As I wrote in my previous post writing realtime application with these platforms is hard. (oh I forgot to mention node.js which I really dont like to use because of the language it is using is ugly and horrible). I tried initial idea with python and end up deleting project cause its taking a lot to get going and its really disappointing. So I set back for a while and think about what could be the right tool to make this idea work as I expect in my mind without too much hassling. So ended up choosing another language different than my current skill.

What language I have choosen is not important here (whether its a well known or not) important thing is you have to choose the right tool for the right job even it takes you to learn new language, framework or even operating system. I dont like the idea of “you have to go with what you got” which I think probably end up blowing in your face in the future even if you save the day. Another thing I see in people if something is popular you have to use it even it doesnt benefit anything to your project which again end up blowing in your face cause you deny to use proven technologies just to satisfy your ego or look cool (yes I am looking mongo people or other tech choices which led to “why I am leaving X” posts on blogs which high probably not fault of the X technology creators).  I rather choose to know a language that known by handful but really be the right thing instead of knowing a language known by millions but just not right thing.

Current State of Realtime Web App Development Looks Boring

Yeah title seems like a troll but no its not. I really dont like the way current web development works especially for realtime applications. There is really little innovation going on how to develope new realtime web applications. We are still using the pattern that people were using in 70s. You just have one (or more ) application server rendering pages according to user request and sending back. If something happens again server render the page again or just fetch another page. Although with the new kids on the block like Rails, Django, Php (generall frameoworks) its kind of removes the boring stuff but still it is not fun if you think in the terms of 2012.

I really like to listen if web application relates to something realtime . It looks like doing some crud and data give-take is not fun for me. I respect people doing this kind of applications but it just do not interest me. There is no challenge in this and it does not give me any delight to develop such an app. Worse part is if its built in a way that it just only works with page reload in every event occur on client side. (meaning not having any idea javascirpt or ajax). Now we have the infrastructure to make web more faster and realtime and importantly pleasant. But still developing realtime applications is not one or two mans job people have long schedules and big teams for that. It shouldnt be that way. (I am considering you have the best tools for these kind of applications). If there is a valuable information out there for me, I should get that as soon as its on the wire. Also we have new kids for this kind of tasks like Tornado, node.js and more but I still think its not easy task to build the idea in realtime manner within one weekend. This is really a problem and if you consider its not the right idea to put infront of the people you may end up losing your 3-4 week at best to put it in front of people.

Whenever I see new web framework adopted by people and see that it really no different that others making abstractions over abstractions but still left out the ability to develop applicaiton that facilitate the realtime. I find Tornado doing some of these things but its not something that you can put your all effort in it. (lack of async libraries, testing…) Even with Tornado its not easy task to build realtime application. Ok enough Tornado, its not easy either with node.js you still have to put a lot effort to get going.

Believe me or not realtime or realtime colloboration is really the future of web applications. Think about one second how applications like google docs, instant facebook notifications, facebook chat, twitter stream and other that you use everyday help your daily life and fun to use. Cause if something contributed by someone on the other end you should able to see it instantly and best part it creates engagement. You dont have these kind of engagement on applicaitons that is not realtime.

Creating realtime applicaiton is really hard. Big companies like facebook, google, twitter can do it because they have some talented engineers but its not the biggest priority for application of two or three people. It should be but as I said before its taking a lot of time to build.

I am sure some of people remember Etherpad which acquired by Google for like 10M$. It is a great example of realtime collaboration applicaiton that is really hard to develop. Even after google made google office and wave come close to etherpad. Here is a issues with that kind of realtime application (written by guy at google who is working on google docs and doing open source project called MobWrite – similar to etherpad doing).

  • Differential synchronization – MobWrite is a symmetrical system where the client and the server run exactly the same algorithms, sharing the work in keeping everyone in sync. Differential synchronization provides a fault-tolerant system that allows conflicts to be resolved automatically on a best-effort basis.
  • Diff, Match and Patch – At the core of both the client and the server is an efficient library that identifies local changes then merges remote changes into the local content. The matching algorithms are also used to restore the cursor or selection after remote changes are received.
  • Communication protocol – Synchronizations occur every few seconds, with the frequency automatically increasing during periods of activity and decreasing when idle. Data is transmitted across the web using Ajax requests, formatted to a custom protocol. This protocol is intended to be simple and flexible enough that new clients or servers may be created which integrate cleanly with MobWrite.
  • Text vs. Numbers – Two different conflict resolution strategies are used. Conflicting text changes are merged gently, attempting to incorporate the intention of all parties. While there exist no-win scenarios, MobWrite does the best job it can, with the understanding that an imperfect merge done in real time will be seen immediately by all affected parties. This is distinct from numeric data and enumerated types which are simply overwritten with the last selected value.
  • Cursor Preservation – When a remote update is received the local user’s cursor, selection and scrollbar needs to be adjusted carefully o make the update as smooth as possible. Two algorithms are blended together to achieve this.

I do not say “wow” to “where did method come from? – oh X framework handle it for you” these kind of things do not keep me awake at night to take a look at that framework or what ever it is. Ok may be I am simplifying things a little but really it should be simple. I only talked about framework side of the development how about deployment and other stuff. Again it is still not fun…

Design “Realtime Notification System” with Tornado

I have been looking forward to write blog post about Tornado usage for a while. Notification system is a vital part of the real time applications such as facebook, twitter (not directly notification but similar philosophy). Ok, in order to get going you have to have Tornado, MongoDb and Redis installed on your system. I am not gonna dive into details of the how to install this cause it really lousy job. Now I am gonna copy paste my detailed design which I also used in my application Vaarmi and had very good results. Enjoy !!!

(* Item : In Vaarmi context item is just a post made by user stating what they wanna buy)

(We : Its just me nobody else :) )

Notifications are important part of the user interaction on application. Most of the real time applications employ notification technique according to their needs like Twiter, Facebook and others. In this application we need notification system that notifies every user that directly connected to requested item, meaning its either owner of the item or just seller that made offer on item. Every user that made offer needs to be notified so that they do not bother on checking manually to see if there is a new offer on item or not. Checking and sending notification for one user is easy but sending notification to each user on that (users who made offer) item is clearly a hard problem to overcome in real time.

As we develop first prototype for notifications, it was not real time and was not persistent (no big deal). We provide notifications entity on MongoDB backend and pushed them into User collection for every user entity. Although it worked as expected it was not fetching the notifications effectively because javascript request are just primitive polling requests and its hard to define right interval to poll new notifications. Every request sent to check notifications were hitting the database (this is big deal), which definitely decrease the applications performance (especially in single threaded env) and we couldn’t find proper time interval to set (its not right to say “couldnt find”, it has so many parameters to calculate). If we provide long time interval user will not get notifications on time, but if we provide too short interval it will affect the application performance. So we decide to use real push technique without persistence employing publish subscribe pattern.

First thing we did is to create general system design for notifications. Here is general perspective for notification design. When user create new item by posting it (sth like I wanna buy X in whitin Y interval and in Z location), we take that user ObjectId which again unique for that user (ObjectId’s are unique keys created with document in MongoDb) and added to Redis ZSET which is basically data structure like set but its sorted. One other thing that Zset has different Zset elements have scores which is double floating point number. Zset seemed like a perfect data structure to store and track user ObjectId’s that will be notified when new event occurs and also we were already using Redis in chat. Zset have some operations like ZADD and ZREM that run in O(log(N)) to add and remove element from set respectively. ZRANGE is another operation that return range of elements from set and runs in O(log(M)) where M basically the last requested element of set.

Now Zset have just buyer id, when sellers start to make offer on item, for every offer we take ObjectId of the user and add them to Zset. We also created notification channel specific to every user to listen new notifications. If Zset have 20 user in it when new user post offer we just page through the Zset with ZRANGE command and create notification data than we push this notification data to channel of the every user on Zset. Finally we add the new offer owner to list because it should not be notified by its own offer. Because every user on Zset long polling their own notification channels each one of them will immediately notified with new notification and can access the new notification easily from user interface.

First thing to do is pushing buyer ObjectId into Zset when user create the item with unique ObjectId. We assign Zset name as ObjectId of the item. So every item will have its own Zset of users that will be notified. Although we are not using score for Zset we are setting it as current time function.

We also have shared javascript code that long polls new notifications on every page so that user never misses any notification while browsing other pages. This script is not different than traditional long polling script that we have used in chat feature.

Long polling notification handled by the “NewNotificationPushHandler” on backend side. It basically asynchronously connects to user notification channel and listens for new notifications. If notifications occur push them to client immediately. Now user can start polling notification on every page to see if there are some events occurred on items that he/she interested in. This section also explained the offer system because it is tightly integrated to notification system and using same technique with chat which was long polling. So there is no need to show same implementation twice.

Now look at the scenario where user post new offer on specific item X which have 10 unique offers. Application pushes the new offer to active channel listeners of that page instantly because it is using the same technique so new offers will flow as user stays on that page without need of refreshing the page. Application will create “notification” template object as “template.html” before sending offer update to listeners. Than will call Zrange function asynchronously with providing callback “publishNotifications”. When callback will executed it will basically loop through the clients in Zset and by using PUBLISH command it will send new offer notification to every users notification channel which has the same name with user id stored in Zset. Because every user is constantly long polling server for new notifications when backend server publish the new notification every user will get its own notification except the owner of the new notification. Than finally we added the new user id to Zset to provide him/her future notifications about item.

Tornado 101 : Introduction to Tornado

Ok this is first of series post about python Tornado framework and intent to teach people Tornado and some of the best practices. Over the series of post I will assume that you have beginner knowladge about python if not check google there are tons of documentation and lately online courses about every language.

What the hack is Tornado ?

Tornado is simple, powerful, easily scalable python web server and web framework. Most of the people either know Tornado as webserver or as web framework but its actually both. It initially created by Friendfeed guys and after the acquisition by Facebook, it now open source and has great vibrant community around it and actively in development. If you wanna play it for a while must have resources Tornado offical web site & source code, Tornado google group and of course google & stackoverflow.

There are lots of alternative what makes Tornado different is that it tries to solve C10K (concurrent ten thousand connections) problem. If you dont wanna read that long but great article from Dan Kegel here is the point of the author “Its time for webservers to handle 10.000 connections at one instance we have enough hardware and infrastructure for that why the hell we could not do it ? And then goes into details of traditional web server approach like thread-per-connection, process-per-connection. The rest of the article is about new approaches that webservers can take in order to achieve that C10K (concurrent ten thousand connections). Some of the approaches get help from OS kernel to achieve that high number concurrent connection which also used by Tornado select and epoll (unix systems) and kqueue in some FreeBSD systems (by the way this is the reason that Tornado is not a good candidate to work on Microsoft stack altough people seems to be using it with minimum problem). Meaning that in order to push boundries of Tornado you have to run it under some kind of unix variant.

Now if we look at the approach that Tornado takes and problem try to solve we can conclude that it is high performance web server+framework. Now we know that its a high performant web server what about web framework side ?. If you look at the framework its very small but elegant, builtin security mechanishms like secure cookies, cross-site-forgery, social network authentication like facebook, twitter, google, simple pythonic templating engine,  built-in mysql wrapper for most of the operations and asynch approach, how about some websocket support for your application ? its already built-in. And greatest part of it if you  dont like some built-in feature just switch with another one like templating engine too basic for you just take jinja and use.

Tornado is greate for applications that need high concurrency and performance but of course its not one framework for all kind of applications. If you gonna build new CMS or need very detailed web framework Tornado may not be for you. You have idea about some kind of interactive realtime web app such as real time analytics engine, social app, even you can use Tornado as TCP server, it does not restrict you.

Install Tornado

Enough background info lets install Tornado. I am gonna show you unix installation so if you on windows you should check out tornado offical site, last time I check it was easy to install on windows.

$ curl -L -O http://github.com/downloads/facebook/tornado/tornado-2.4.tar.gz 

$ tar xvzf tornado-2.1.1.tar.gz
$ cd tornado-2.1.1

$ python setup.py build

$ sudo python setup.py install

Also if you dont wanna deal with multiple command like above you can use pypi or easy_install to install lateset version of Tornado which is 2.4 as I am writing this article.

How about Hello World App ?

import tornado.ioloop
import tornado.web

class MainHandler(tornado.web.RequestHandler):
    def get(self):
        self.write("Hello, world")

application = tornado.web.Application([
    (r"/", MainHandler),
])

if __name__ == "__main__":
    application.listen(8888)
    tornado.ioloop.IOLoop.instance().start()

Here is hello world application. Lets dive into it. If you are writing Tornado app some of the things you must be familiar.

You have to import at least tornado.ioloop & tornado.web modules. There are minimum requirements for you.

In order to handle Http request tornado have RequestHandlers which again reside in RequestHandler package. Request handler have all of the Http methods like GET, POST, HEAD etc. but you just implement what you need. In hello world application we just implemented GET method if you make POST request you probably get error stating that POST method does not implemented.

Another thing is application object. We have created instance of Application object and give mappings of handlers, our MainHandler will be answering requests from “/” path. If you have further handlers you can just keep adding to there and good thing is you can easily use regex in order to match your RequestHandler. We set application to listen 8888 if this is the port you wanna run the app.

Last thing and most important is getting instance of IOLoop which we only have one. Yeah right all that high throughput comes from one thread so you have to be careful while using. I can only say that right now that its the result of async and non blocking io. (For further information you can check C10K article). Let me create another version of application with more than one handler.

import tornado.ioloop
import tornado.web
import tornado.options
import tornado.httpserver
import os,logging

from tornado.options import options,define

log = logging.getLogger(__name__)

"""
create definitions here like facebook, twitter keys
port number etc
"""

define("port", default=8888, help="default port number", type=int)
define("facebook_api_key", default="", help="facebook api key for facebook graph api")
define("facebook_secret", default="", help="facebook api secret")
define("twitter_consumer_key", default="", help="twitter api key")
define("twitter_consumer_secret", default="", help="twitter secret key")
class Application(tornado.web.Application):

    def __init__(self):

        handlers = [
            (r"/", MainHandler),
            (r"/another", AnotherHandler)
        ]

        settings = dict(
        app_title = "Hack Skeleton",
        template_path = os.path.join(os.path.dirname(__file__), "templates"),
        static_path = os.path.join(os.path.dirname(__file__),"static"),
        cookie_secret = "",
        login_url = "/login",
        facebook_api_key = options.facebook_api_key,
        facebook_secret = options.facebook_secret,
        twitter_consumer_key = options.twitter_consumer_key,
        twitter_consumer_secret = options.twitter_consumer_secret,
        xsrf_cookies = False,
        autoescape = None,
        debug = True,
        )

    tornado.web.Application.__init__(self, handlers, **settings)
class MainHandler(tornado.web.RequestHandler):

    def get(self):
        self.write("Lets start hacking !!!")

class AnotherHandler(tornado.web.RequestHandler):

    def post(self):

        first_arg = self.get_argument("x", None)
        second_arg = self.get_argument("y", None)

        # some database operation may be ??
        # get the data and render
        self.render("my_template.html", first_view_arg = first_arg, second_view_arg = second_arg)
def main():

    tornado.options.parse_command_line()
    logging.getLogger().setLevel(logging.INFO)
    http_server = tornado.httpserver.HTTPServer(Application())
    http_server.listen(options.port)
    tornado.ioloop.IOLoop.instance().start()

if __name__ == "__main__":
main()

I think this approach is better than the first version because you can easily give options and parameters to application object like login path, social api keys, and you can bind other objects to application object and lets them run when the application start.

How to Run Application ?

Yeah I gave you the hello world but did not tell how to run it. Its easy like any other python program.

python yourappname.py

You can also specify the port number on run command.

python yourappname.py --port=9999

How Http Method Response Handling ?

When you wanna send specific Http response code as return you can use built-in set_status(404) which is page not found or set_status(405) which is method not allowed.

class AnotherHandler(tornado.web.RequestHandler):

    def post(self):

        first_arg = self.get_argument("x", None)
        second_arg = self.get_argument("y", None)

        if first_arg:
            self.set_status(404)
        # some database operation may be ??
        # get the data and render
        self.render("my_template.html", first_view_arg = first_arg, second_view_arg = second_arg)

I think I have covered enough to give you the idea about what is Tornado and how it works. Wait for next post for another topic.