Tag Archives: featured

Mumbai’s Data Meet Kicks Off With A Bang

Written by Sanjit Oberai
Mumbai’s first Data Meet kicked off on 30th August at the Sardar Patel Institute of Technology with a total of  26 people attending the event.  It started off with a round of introduction by all the attendees which was a mix comprising developers, journalists, students and data enthusiasts. 
There was an introduction by Ritvvij, who is the founder of Pykih, where he spoke about how important it is to have a data group in Mumbai. ( Listen to the recording here
The first talk was by Ajaj Kelkar (above) , who is the  Cofounder http://hansacequity.com  and he gave an introduction to how the recent movement of Open Data started about 5-7 years back in the US as there was  a need to move data from the private space into public space, and this was possible by the the active push seen by transparency groups.  This idea spread and many progressive countries realised that this can be a powerful movement which can be used for public good.   
Data can be used to help take decisions on the social or personal platforms. He highlighted that there are barriers and we need to overcome them.  He explained how many cities abroad have appointed Chief Data Officers whose jobs is to monitor data in each city i.e. municipality budgets, etc. 
An important point also spoken about was on Privacy.  As consumers we are leaving a lot of data out there on social media platforms. However, in the absence of proper laws, we need to be careful about what we put out there and this needs to an area where we need to think carefully about.
The second talk was by Ritvvij, founder of http://www.pykih.com who explained the importance of how one should visualise. He emphasised that a lot of people are not aware of how to use correct tools to visualise data and the most common mistake people make is with the humble PIE CHART. He further went onto explain the process of visualisation mistakes that many news organisations are making today and what they could do to improve their charts/graphs
He also gave examples of his recent work with Firstpost.com where he made custom visualisations for them for the elections and the IPL.  He also worked with narendramodi.in . 
He also spoke about the issues most journalist face when dealing with government data and how difficult it was for them to have access to it.  He ended by stating that there was a dire need for a tool that could become the CMS for data journalism there by allowing journalists to focus on the story rather than doing data janitorial work. 
 
 
The third talk was by Sanjit Oberai, Deputy Editor of IndiaSpend, a non-profit that uses data to tell stories. He spoke about how there are tonnes of stories buried in government data and how to write articles around that. He spoke about how they research articles, what are the sources of data and how one can visualise data using free to use tools like Data Wrapper, Knoema, Tableau, etc.
 He also spoke about a new initiative called Fact Check which can be used to bring about accountability and raise the common man’s awareness. He cited examples about the Goa MLAs who were going to Brazil and how that created quite a stir with the Congress calling this a wasteful expenditure.  A quick factcheck was done to see the assets declared by them in the sworn affidavits provided by the candidates before the legislative elections in 2012.
 
 He also spoke of a Data Room which would be a first its kind resource for students, journalists and researchers that will allow comparison of state wise data like population, health, education, etc.
The last talk was by Srinivas Kodali, an IIT-Madras graduate, who is researching with transport data of cities. He explained how one could scrap data from websites and demoed tools like Selenium for scraping from sites that generate data on the fly using AJAX. However, Selenium is a front-end tool typically used for testing and requires a browser session open. Hence, it cannot be used for large scale scrapping. Then he went on to show case PhantomJS that would allow scrapping  as Selenium would but in a Headless fashion i.e. without the browser UI.
More Pics:
The next meeting with be held at the end of September. Details will be posted on the site soon.

GeoBLR – PIN Code Extravaganza!

Last week at GeoBLR we discussed the issues around PIN codes. The most  important questions were around the processes the postal system and also what are the issues around the availability of reliable spatial data.

Couple of weeks back, Nisha and I started putting together several questions that we would like to get insights on. We used that as the starting point for the discussions. The meat of the problem really is that nobody knows what the processes are and how to get that information.

Prior to GeoBLR, we met some people who are interested in the same issue and clarified a lot of things – for instance, we are now sure that some times a single post office can deal with more than one PIN code.

To get a sense how people felt about the PIN codes issues, we asked around. Some people don’t bother to use PIN codes for any substantial service other than sending post cards.  As long as we are not able to tie PIN codes to geographic locations reliably, it’s not so useful.  Everybody agrees that it has immense potential just because it’s the only part of the address that everybody gets right (most of the time).

We also started to brainstorm how to come up with a plan so that a group like ours along with several other partners could work together to attempt to crowdsource the issue. Read more about the plan and next steps here!

20140821_191027 20140821_191035

Crosspost: The Hindu’s Rape Statistics Story

A few weeks ago The Hindu’s Data Blog had a three part series looking at Data on rape cases in Delhi.  It was a powerful story that had a lot of people talking and a good example of what can be done with data available.  Rukmini S has written a piece detailing how she combed through the data to get the story.

Below is an excerpt.

How we put together the statistics that went into our investigation

“Delhi is better than most Indian cities for legal data journalism because it puts all district court judgements online – and promptly – and these can be text-searched. Ideally, I should have been able to scrape all judgements for ‘376’, the IPC section related to rape. However, I encountered a ton of issues that would have rendered a scraping tool useless (as far as I know – if you think there was a way I could have done it, do leave me a comment).

For one, while rape cases are sessions-triable, and so should show up as ‘sessions case” in the nomenclature, for some judges the cases were inexplicably classified as “criminal cases”. Then, while a simple text-search for ‘376’ should have been enough to get me all cases, the text-search function inexplicably collapsed around March 2014. With elections coming up, I had limited time to work on this and had to essentially open every single sessions court judgement and search for ‘376’ in each one. Luckily, the search function revived after two months.”

Read the rest here.

[ODCBLR2014] Address by TS Krishnamurthy

At this years Open Data Camp Bangalore Mr. T. S. Krishnamurthy addressed the audience.

From Wikipedia:

Taruvai Subayya Krishnamurthy (born 1941) was the Chief Election Commissioner (C.E.C) of India (February 2004 – May 2005).[1] His main assignment as C.E.C was to oversee the 2004 elections to the Lok Sabha. He was known for his integrity and a polite yet firm fist with which he handled all sensitive assignments throughout his career.He had earlier served in the Election Commission of India as a commissioner since January 2000.