Data.Gov.In Beta is Up just launched its beta version. With 13 datasets already online and downloadable it is a good start. Granted not everything is working smoothly but given how new it is I’m feeling pretty good about the effort.

Like all things everywhere it never really is about intentions or really first steps but implementation and continued support. This is where the difficultly will lie with the Indian Government’s IT department the National Informatics Centre. They will be running the site and implementing the portal and I hope also can be an intermediary between citizens and the ministries that are providing the data.

Unlike or, the data does not have to stay on, according the governing policy the National Data Sharing and Accessibility Policy. The individual Ministry has control of the data and can still charge, not make data downloadable, and also restrict data and not tell you why. Also incredibly valuable datasets like the Census and the National Sample Survey (official link not working) will not be available for download.
Continue reading Data.Gov.In Beta is Up

Reflections of Chennai’s Data Workshop from India Water Portal

Cross Posted from India Water Portal

Written by Aarti Kelkar-Khambete

This workshop organised by Transparent Chennai at The Institute of Financial Management and Research, Chennai was the outcome of the experiences of the earlier open data camp events organised by Transperant Chennai in Bangalore and Hyderabad, where there was a wide discussion among attendees who were excited by the potential of
data and the open data movement, but who did not have the necessary skills or technical background to work effectively with it.
It was felt that there was a much larger community of activists, researchers, and on-profits who could benefit from learning to use the kinds of tools presented at the camps. Thus, this event was planned differently from a data camp and focused on training activists, researchers and students to work with data where participants would learn about open data, data visualisation, spatial data and practical issues that come up when working with data in various forms.

The workshop thus aimed at helping the participants to:

  • Understand various formats of data, diverse possibilities of data visualisation and effective tools for doing so, with a special focus on web-based tools
  • Understand how to think through projects involving collection, processing and visualisation of data
  • Develop a basic understanding of software packages and methods for visualising quantitative data, creating geo-visualisation and undertaking participatory mapping
  • Understand the connection between data technologies and rights to access and use data.

Read the rest of the summary here.

Data Science meets Data Technology

Data Science meets Data Technology
The Big Picture
August 18th, 2012
10:00 AM – 12:30 PM
Conference Room 2
NIAS, Bangalore

While data is increasingly important in academia as well as in industry, the two worlds do not intersect each other all that often. DSDT is a monthly forum for sharing ideas about data across disciplines and industries. Each DSDT meeting will consist of two talks on a common theme, pairing a data scientist with a data technologist along with time for discussion. From the second session onward, we will have a tutorial and hacking session after the talks where we will learn how to work on understanding and analysing data sets relevant to that meeting’s theme. The schedule for the first meeting on the 18th at NIAS is given below.

10:00 AM – 10:30 AM: Tea
10:30 AM – 11:00 AM: Analysis: The Big Picture. Rajesh Kasturirangan, NIAS.
11:00 AM – 11:15 AM: Discussion of talk 1.
11:15 AM – 11:45 AM: Data and Visualization: The Big Picture. S. Anand, Gramener.
11:45 AM – Noon. Discussion of talk 2.
Noon – 12:30 PM: General Discussion.
For more information, visit

Scrapathon 1: Rajasthan Rain Water Data

Cross Posted from Rajasthan Rainfall Data (1957 to 2011) by Thejesh GN

The Rajashtan rainfall data was scraped as part of Scrapathon held in Bangalore 21st July 2011. Intially I used scraperwiki, but the huge amount of data made it to timeout all the time 🙂 so I wrote a simple python script to do it.

Data is in the SQLITE file data.sqlite, in a table called rainfall. It has 6,61,459 rows.

PAGE_ID refers to the ID in the table webpages which lists the webpages from where these data where scraped. It will help you incase you want to cross check. The rest of the columns are self explanatory. I have signed the SQLITE database using my GPG keys and the signatures are inside the file data_gpg_signature.sig

You can download my public key from any keyserver or from biglumber.

You can download here as of now. I will try to make it available on torrent later.

PIN code mapping

Where: Skype ID: datameet



  • We’ll go for bulk geo-coding as opposed to crowd-sourcing
  • We’ll bulk source addresses. Please add any other sources you can think of
  • The Postal College’s list of post offices
  • Branch lists from banks such as SBI, or organisations like BSNL
  • Telephone directories
  • We’ll run them through Yahoo’s Placefinder, which is liberal in API limits and in licensing
  • We’ll create Voronoi treemaps out of those (ideally as OpenStreetMap XML files)

Linked mentioned during the meet:

Text & Geo processing

Where: Skype ID: datameet


  • Introductions [everyone, 10 seconds each]
  • Discussion on the most interesting visualisation you’ve seen recently
  • Discussion on any sources of data you’ve come across
  •  Recording of the talk is available

Linked mentioned during the meet:

Quote of the call: “this call started with no agenda, but ended with quite hands full. happy. nothing more to add” — Balaganesh

R, Processing, Protovis

Where: Skype ID: datameet


  • Introductions [everyone, 10 seconds each]
  • Joining the mailing list, and sharing articles related to data science via RSS [Thej, 3 min]
  • Splitting up sections on the “Wiki” between ourselves to populate content [Manu, 3 min]
  • Adding to the directory and data store [An        and, 2 min]
  • Talks [The links have audio]:
  • Learning R [Anand]
  • Processing & Protovis [Arun & Venkat]
  • Indian budget visualisation [All, 5 min]
  • Taking the physical community forward [Bala, 2 min]

Links mentioned during the meet:

DataMeet is a community of Data Science and Open Data enthusiasts.