My book at google books
June 11, 2009
Long ago I wrote a book about Linux Programming (portuguese only). I just saw that it is added to google books at http://books.google.com/books?id=nMoQKiQyHwwC . There are some cute previews, including a serial port circuit picture
link recap
May 7, 2009
Following up this old post, there is an presentation from MySQL Conf & Expo 2009 about using Memcached showing the basic usage pattern, which by the way is not using memcached as a database, asking if you can search inside it’s keys and so on.
Another presentation from Craiglist’s Jeremy Zawodny about using sphinx to index mysql data is there too.
Ezra Zygmuntowicz’ Nanite was presented at erlang factory (which I may or may not write about later) and seems like a perfect case of AMPQ and queue use (I got a pretty solid understanding of routing keys from it).
More on memcached
February 17, 2009
Sharing memcache data between different applications is useful and easy, be it as a glorified IPC, a robust distributed cache, rate limit control or any other suggested architecture approach.
There are some caveats tho:
- The captain obvious one: if its the case, make sure the way you store your data is readable between different languages. For example, storing in python and reading in java or ruby a pickled object is trivial, but persisting some specific objects, like rails is prone to do, may render the data almost unreadable. Try to use simple serialization formats if possible (like yaml, json, xml).
- The other captain obvious one: saving and invalidating data must be done by the application responsible for its integrity, for simplicity and safety sake. Remember cache 101: a cache is not a database. It’s not searchable, and its data must reflect a coherent source of data.
- The not so obvious one: if you use more than a memcached server, make sure both clients understand the hashing algorithm which is used to select the right server for the key you are asking. When using the same language and client this is transparent, but there different known ways to select the right server as:
- md5 hash of the key
- crc32 based hash
- native hash (as String.hashCode() in java)
- pure magic hash (some clients implement non-standard memcache
The case in point is a ruby application using the memcache-client gem and a java application using whalin’s client. If you use more than one server, the ruby client uses it’s unique algorithm, which is CRC32 based. The java client defaults to a NATIVE based algorithm, but contains 3 more algorithms. Keys would never get correct hits this way.
Let’s see how it works :
Code for hashing in Ruby (straight from memcache_client gem)
# Note that the method crc32_ITU_T is a patch for the String class from memcache_client def hash_for(key) (key.crc32_ITU_T >> 16) & 0x7fff end
Code for the right hashing algorithm in JAVA:
private static long newCompatHashingAlg( String key ) {
CRC32 checksum = new CRC32();
checksum.update( key.getBytes() );
long crc = checksum.getValue();
return (crc >> 16) & 0x7fff;
}
The algorithm is selected by this piece of code, whalin’s memcache client library:
case NATIVE_HASH: return (long)key.hashCode(); case OLD_COMPAT_HASH: return origCompatHashingAlg( key ); case NEW_COMPAT_HASH: return newCompatHashingAlg( key ); case CONSISTENT_HASH: return md5HashingAlg( key ); default: // use the native hash as a default hashingAlg = NATIVE_HASH; return (long)key.hashCode();
So, before using the client in java, we need to issue setHashingAlg( SockIOPool.NEW_COMPAT_HASH ); on the right SockIOPool object.
That’s it.
Now, for a change …
Really unneccessary section !
We can test the CRC32 based algorithm like this:
Start irb and type:
irb --> require "rubygems" ==> true irb --> require "memcache" ==> true irb --> a = "mykey" ==> "mykey" irb --> (a.crc32_ITU_T() >> 16) & 0x7fff ==> 17510
From this, we see that 17510 is the resulting hash for “mykey” key.
The memcache client was required just to attach the crc32_ITU_T() method to the String class, but if you dont want to install it, just paste the following code (which is part of memcache_client) instead:
class String ## # Uses the ITU-T polynomial in the CRC32 algorithm. def crc32_ITU_T n = length r = 0xFFFFFFFF n.times do |i| r ^= self[i] 8.times do if (r & 1) != 0 then r = (r>>1) ^ 0xEDB88320 else r >>= 1 end end end r ^ 0xFFFFFFFF end end
Let’s test it in JAVA’s end:
TestCRC.java
import java.util.zip.CRC32;
public class TestCRC {
public static void main(String[] args) {
CRC32 checksum = new CRC32();
checksum.update("mykey".getBytes());
long crc = checksum.getValue();
System.out.println(((crc >> 16) & 0x7fff));
}
}
Compile and run as:
$ javac TestCRC.java
$ java -cp . TestCRC
17510
Again, 17510, as in Ruby. That’s the right value for “mykey”.
Both cases lent 17510 as result, which would then be divided by the number of machines in the pool (e.g. 2) and the mod of this operation is the index of the right server, both in JAVA and Ruby. Weee.
Diagrams, cache and threads 101 (and a little ranting)
December 3, 2008
I’ve been using inkscape to draw diagrams, specially abusing its import openclipart feature, but I found out that for simple sequence diagrams, there is another great tool: http://www.websequencediagrams.com/.
Their API is clean and the text parsing very accurate.Choose “Napkin” in the style combo and click draw to see their demo.
I found it via another gem, this caching 101 page for dummies. Things like this and this ‘Threads primer’ are necessary reminder nowadays. There are a lot of butchered applications and architectures popping everywhere, and without these kind souls providing the best of them going thru basic concepts, we’re all doomed.
There goes my contribution, a kind of memcached 101, both in napkin and blue modern styles !
Paste the code below to generate these diagrams. Note the alt 'command' I used to put both cases in the same diagram. Gotta love that.Alice->Application: Asks for her Profile Application->Memcached: is Alice profile there \? alt Data is not cached yet Memcached->Application: No, it's not here Application->Database: get me Alice's Profile Database->Application: here is the data - it took me a lot of time, k \? Application->Memcached: set Alice Profile there else data is already cached yay Memcached->Application: Got it end Application-->Alice: Response (her profile data)
Hello there !
November 28, 2008
Hey, long time no see. What’s up ? Check another mile lenght post about interweb, python, php, queues here: http://zenmachine.wordpress.com/vote-for-pedro-rating-pages-with-activemq-or-any-other-broker/
Enjoy.
Google AppEngine testdrive
May 3, 2008
I got my google appengine key yesterday, so, to enjoy the holiday weekend I decided to give it a try. I build a small app called peekr, for link classification. Between reading the docs, sample app and working thru html and css, it took me about 3 hours.
My plan is to put a couple more of hours to implement user limit for classification, and a simple text classification algorithm, to extract some tags and relate all stored links.
In some aspects, it reminded when I first used ruby on rails. The API is very straightforward, docs are ok and very clear about limits and boundaries, and the SDK is very simple to use. I think that’s where the resemblance ends, because, according to them, you have the whole google backend to enable scaling and so on.
That’s a nice plataform to test ideas and I think they may start charging it in the future if an app requires more storage space, more CPU cycles or a separate domain name. And if it keeps working nice as it is, it may be a good move.
More impressions as soon as I manage to finish the application. Until there, off to eat steak.
Automatic rss feed to twitter posting
April 26, 2008
Ultra exciting mashups in python ! Check http://zenmachine.wordpress.com/rss-to-twitter-automagic-poster/
From ruby to python and more…
April 20, 2008
Hi. Long time no see eh ?
There is nothing wrong with Ruby. Fine language, great gems and so on. But its interpreter sucks. Since I had to cross compile it for arm, and later fooling around with it while a friend had to run some heavy weight gdb debugging on it, I noticed that most of it was sloppy. Worst than green threads and stuff which really matters when developing for a high load and concurrency environment, there are places that plainly sucks.
It’s a shame, because Ruby itself is great, and I really appreciate that you can do everything in a lot of ways. It makes porting libraries from other languages a breeze. I hope it gets better by 1.9 or 2.0.
Meanwhile, I started to do some stuff using python, specially for networked services. Check out my come back at http://zenmachine.wordpress.com/web-services-and-twisted/
Cheers !
Using ruby to check on text classifications algorithms.
January 3, 2008
I’ve been trying to finish this post since middle of December, but after about 4 almost complete rewrites I’ve decided to put it online. I still mean to make it better, because I didn’t wanted to sound cocky or give the wrong impression that it is about evaluating the best text classification algorithm out there.
Here it goes: Practical text classification with Ruby
Thanks to Renato for reviewing it beforehand.
The odds of cross compiling ruby on arm
December 15, 2007
I wrote this guide as a result of one of the cross-compiling oddities I’ve been thru the last month. Let me know if you have any suggestions about the build process I’ve been using.

