Data Science meets Data Technology

Data Science meets Data Technology
The Big Picture
August 18th, 2012
10:00 AM – 12:30 PM
Conference Room 2
NIAS, Bangalore

While data is increasingly important in academia as well as in industry, the two worlds do not intersect each other all that often. DSDT is a monthly forum for sharing ideas about data across disciplines and industries. Each DSDT meeting will consist of two talks on a common theme, pairing a data scientist with a data technologist along with time for discussion. From the second session onward, we will have a tutorial and hacking session after the talks where we will learn how to work on understanding and analysing data sets relevant to that meeting’s theme. The schedule for the first meeting on the 18th at NIAS is given below.

10:00 AM – 10:30 AM: Tea
10:30 AM – 11:00 AM: Analysis: The Big Picture. Rajesh Kasturirangan, NIAS.
11:00 AM – 11:15 AM: Discussion of talk 1.
11:15 AM – 11:45 AM: Data and Visualization: The Big Picture. S. Anand, Gramener.
11:45 AM – Noon. Discussion of talk 2.
Noon – 12:30 PM: General Discussion.
For more information, visit http://analysis.knofu.org/

Scrapathon 1: Rajasthan Rain Water Data

Cross Posted from Rajasthan Rainfall Data (1957 to 2011) by Thejesh GN

The Rajashtan rainfall data was scraped as part of Scrapathon held in Bangalore 21st July 2011. Intially I used scraperwiki, but the huge amount of data made it to timeout all the time 🙂 so I wrote a simple python script to do it.

Data is in the SQLITE file data.sqlite, in a table called rainfall. It has 6,61,459 rows.
Columns: DISTRICT, STATION, YEAR, MONTH, DAY, RAIN_FALL, PAGE_ID

PAGE_ID refers to the ID in the table webpages which lists the webpages from where these data where scraped. It will help you incase you want to cross check. The rest of the columns are self explanatory. I have signed the SQLITE database using my GPG keys and the signatures are inside the file data_gpg_signature.sig

You can download my public key from any keyserver or from biglumber.

You can download here as of now. I will try to make it available on torrent later.

PIN code mapping

Where: Skype ID: datameet

Agenda:

Summary:

  • We’ll go for bulk geo-coding as opposed to crowd-sourcing
  • We’ll bulk source addresses. Please add any other sources you can think of
  • The Postal College’s list of post offices
  • Branch lists from banks such as SBI, or organisations like BSNL
  • Telephone directories
  • We’ll run them through Yahoo’s Placefinder, which is liberal in API limits and in licensing
  • We’ll create Voronoi treemaps out of those (ideally as OpenStreetMap XML files)

Linked mentioned during the meet:

Text & Geo processing

Where: Skype ID: datameet

Agenda:

  • Introductions [everyone, 10 seconds each]
  • Discussion on the most interesting visualisation you’ve seen recently
  • Discussion on any sources of data you’ve come across
  •  Recording of the talk is available

Linked mentioned during the meet:

Quote of the call: “this call started with no agenda, but ended with quite hands full. happy. nothing more to add” — Balaganesh

R, Processing, Protovis

Where: Skype ID: datameet

Agenda:

  • Introductions [everyone, 10 seconds each]
  • Joining the mailing list, and sharing articles related to data science via RSS [Thej, 3 min]
  • Splitting up sections on the “Wiki” between ourselves to populate content [Manu, 3 min]
  • Adding to the directory and data store [An        and, 2 min]
  • Talks [The links have audio]:
  • Learning R [Anand]
  • Processing & Protovis [Arun & Venkat]
  • Indian budget visualisation [All, 5 min]
  • Taking the physical community forward [Bala, 2 min]

Links mentioned during the meet:

DataMeet is a community of Data Science and Open Data enthusiasts.