icalendar gem

November 19, 2007

ICalendar (iCal) is a standard for calendar data interchange. There’s a gem called icalendar, which helps to parse and generate such file, so you may use data from your google or exchange calendar to feed your app (or make it generate data to feed your calendar, e.g., a link to Digg or Facebook in each post of your blog to setup a TODO item).

To parse a .ics file (iCal invite or TODO item) it’s just a matter of looping thru the elements in a given calendar. A ics file may hold more than one calendar, end each calendar may contain events and TODO itens.

#!/usr/bin/env ruby
require 'rubygems'

require 'icalendar'


if (ARGV.size < 1) then
 puts "Usage: ical_parse.rb <calendar.ics>"
 exit
end


cal_file = File.open(ARGV[0])

cals = Icalendar.parse(cal_file)
if (cals.size==0) then
 puts "Empty calendar"
 exit
end


cals.each {|c|

 puts "\nEvents\n\n"


	if (c.events.size == 0) then

 	puts "Empty event list"

 else

 	c.events.each { |e|

 		puts "---------------------------------------"

 		puts "Seq:"+e.sequence.to_s
 		puts "UID:"+e.uid.to_s
 		puts "DTSTART: "+e.dtstart.to_s
 		puts "summary: " + e.summary
 		puts "location: " + e.location
 		puts "description: "+e.description

 		if (not e.attendees.nil?) then

 			puts "attendee: "
 			e.attendees.each{|a|
 				puts "\t"+a.to
 			}

 		end

 		puts "---------------------------------------"

 	}

 end


	puts "\nTODO\n\n"

	t=c.todos
 if (t.size == 0) then

 	puts "Empty TODO list"

 else

 	puts "---------------------------------------"

 	t.each {|oi|

 		puts "Seq:"+oi.sequence.to_s
 		puts "UID:"+oi.uid.to_s
 		puts oi.dtstart
 		puts "summary "+oi.summary

 	}

 	puts "---------------------------------------"

 end

}

 	

When scraping for info on any website, the most time consuming part is locating where is what you need, and how it’s enclosed. Most of time, automatically generated HTML can be pretty convoluted due to templating systems. Hand made HTML tends to be more cleaner but it’s not so common these days.

Firebug is an extension for Firefox which among other things, can help you find URL, XPath for certain elements, discover action names, find out how does the forms are handled and so on.

Having a full XPath or the right URL for a form in a few clicks is a great productivity improvement. To show how to do it, I will download my contacts stored in a GMail account.

First and foremost, we need to know how to export contacs manually. It’s a matter of logging in, clicking in the Contacts link below your folders, clicking again the export button, selecting the proper options (All contacts, Outlook CSV format) and clicking another export button.

What we may need more than XPath harvesting is an automation tool, so it can navigate to the right URL. Better yet, we need the export action URL so we may not need to simulate ‘clicking’ as most automation libraries do.

Apart from Firefox loaded with Firebug, we will use Ruby and WWW::Mechanize. WWW::Mechanize uses Hpricot to handle XPath and has nice features like a cookiejar to handle all cookies, redirection following and form handling.

The first step is login using gmail’s form. It’s a simple html form, the first one of the page. Let’s find out the names of the input fields. Start Firefox, points to https://www.gmail.com and activate Firebug by clicking the icon in the low right corner.

login page

Use the inspect feature to see the HTML code for a given element. inspect may return its full XPath or DOM name. Take some time to explore the login screen and note that the field’s name are Email and Passwd, and they are case-sensitive. To login in using www::mechanize the code would be like:

agent = WWW::Mechanize.new { |obj| obj.log = Logger.new(‘gmail.log’) }
page = agent.get(‘https://www.gmail.com&#8217;)

form = page.forms.first
form.Email = ‘username’
form.Passwd = ‘passwd’

page = agent.submit(form)

After logging in, mechanize will take care of any redirection and cookies. We may proceed requesting for any other element.

Our goal is exporting a contact list and clicking the way to it is not the smartest idea. We need the exact URL to get it. Let’s find it:

contact management

export contacts

Enable firebug, select the ‘Net’ tab and click into export.

Contact export screen

Check Firebug’s console for the list of net requests. There we will find the exact URL we need:

network requests

Mouse over the itens to see the URL value. In gmail it will be the one labelled export, but go on to see the other backgrounds request it does.
contact list download

The contact list export URL is http://mail.google.com/mail/contacts/data/export?exportType=ALL&groupToExport=&out=OUTLOOK_CSV.

After logging in, it’s a matter of just requesting this URL and saving the file:

page = agent.get(‘http://mail.google.com/mail/contacts/data/export?exportType=ALL&groupToExport=&out=OUTLOOK_CSV&#8217;)

page.save_as(‘gmail_contacts.csv’)

And that’s it.

Check Firebug documentation and scripts to learn other ways to avoid heavy work by perusing it. See the full script below.

——- gmail-scrap.rb ————

#!/usr/bin/env ruby

require 'rubygems'
require 'mechanize'
require 'logger'

agent = WWW::Mechanize.new { |obj| obj.log = Logger.new('gmail.log') }

page = agent.get('https://www.gmail.com')

form = page.forms.first
form.Email = 'username'
form.Passwd = 'passwd'

page = agent.submit(form)

page = agent.get('http://mail.google.com/mail/contacts/data/export?exportType=ALL&groupToExport=&out=OUTLOOK_CSV')

page.save_as('gmail_contacts.csv')

——- gmail-scrap.rb ————