Category Archives: Reports

DataMeet-Up Report, Delhi

Notes from DataMeet-Up in Delhi, 22 November 2013

November 27, 2013 Sumandro

We had DataMeet-Up on Friday, November 22, 2013, at the Akvo office in Yususf Sarai Community Centre, Delhi.

Here are the notes from the meet-up [additional information in square brackets]:

Election Data Hackathon

We will undertake a collaborative mapping of datasets relevant for election data hackathon, using GitHub and Google Drive. More details about this below.
Datasets that we are trying to locate include: election results data (total vote count, vote count per party/candidate, etc), total utilisation and composition of utilisation of MP Local Area Development funds, parliamentary activities of MPs (presence/absence, questions asked, bill discussed, committees joined. etc), crime data corresponding to constituencies, etc.
We will identify organisations who might hold additional relevant data, such as PRS Legislative Research, Association for Democratic Reforms (and MyNeta.info), Gramener, and Hindustan Times [Anika used to work at HT].
Two caveats: (1) we may not get unique and standard identifiers across datasets, and (2) calculations may get difficult in case of by-elections [Lok Sabha Secretariat will have details of all by-elections, which can be accessed through RTI request].

Hack for Change on Women’s Rights

Shobha, Breakthrough.tv, led the discussion on the planned Hack for Change event being organised by Breakthrough and Hacks/Hackers, as part of the 16 days of activism against violence against women.
The hackathon is organised around urban safety data from Whypoll , multimedia evidences of early marriage practices in Bihar and Jharkhand gathered by Gramvaani , etc. It will also include a Wikipedia Edit-athon facilitated by Noopur Raval.
There were multi-directional discussions around other datasets of relevance for the hack event, which I have not kept track of very well. Overall, there were discussions around datasets available from , those published by National Crime Records Bureau, FIR and call database of Delhi police (and how to access that), and data on violence against women gathered by Tata Institute of Social Sciences from police stations across seven states.

Presentation on iPython

Konark Modi presented a detailed introduction to using iPython to undertake data cleaning in a very organised manner, as well collaboration features/workflow of iPython.
There emerged a demand for a tutorial on OpenRefine (previously Google Refine), which will be organised in a later meeting.

Mapping Indian Election Data

We will start documenting publicly available datasets relevant for studying past General Assembly (Lok Sabha) elections in India and the activities of the elected members at present. One can contribute to this mapping exercise in two ways, as mentioned below.
GitHub: We have created a repository for this data mapping exercise under the DataMeet organisation at GitHub. The organisation page can be accessed here, and the (india-election-data) repository can be accessed here. In the repository, I have created a draft format for documenting the identified datasets. This draft format can be accessed here. Please feel free to suggest changes to the draft format by opening an issue.
To document a dataset, use the format given in the repository, fill up the details, and rename the file according to the dataset’s name, such as “election-results-delhi-1995.md”. Then if you notice any requirement of data cleaning/reorganisation or lack of clarity regarding the dataset, open an issue (where the name of the dataset is mentioned) to note that task.
Google Drive spreadsheet: Alternatively, you can access this spreadsheet on Google Drive and add the relevant information about the dataset documented by you.

Please comment here or post to the DataMeet mailing list for any clarifications and suggestions.

DataMeet-Up Report, Delhi

Notes from DataMeet-Up in Delhi, 31 May 2013

June 3, 2013 Sumandro

Authors: Satyakam Goswami & Nasr ul Hadi

Venue

Akvo Foundation, Yusuf Sarai, New Delhi

Discussion

This was our second meet-up in Delhi. We began with a round of introductions, a brief discussion about DataMeet’s (Bangalore-centric) history.

Data Journalism

Nasr shared his experiences trying to apply data science to journalism and how, in most cases, the data was either incomplete, not specific enough or simply inaccessible. One of his projects involved patrolling with the Delhi Police and tracking the number of accident victims they transported to the Jai Prakash Narayan Trauma Center. Out of curiosity, Satyakam asked him more about how the trauma centre worked, because they have been using a FOSS SMS service to reduce clogging of their registration/reception counters.

Nasr also discussed the work of organisations like Human Rights Law Network and the immense amount of data they have on instances of abuse, child incarceration, etc in Indian prisons. Unfortunately, these aren’t digital. Guneet gave a similar description of police reform work done by CHRI.

Collaborations with Other Groups

We also discussed opportunities to collaborate with other meetups that shared our interest in data science. Hack/Hackers’ Delhi chapter conducted a news apps hackathon in February. Their next event will focus on data journalism, probably along the lines of the recent Editors’ Lab. We still need to discuss how to:

educate Journalist on the importance of data driven journalism, and
enable them with the tools and possible collaborations with people in this group.

We are also open to other such collaborations, but couldn’t think of any other community with a similar interest. If you know of any, please do let us know.

Elections 2014 and Public Infrastructure

Everyone was interested in building apps around then upcoming Delhi Elections and then the General Elections in 2014. Anyone interested in taking the lead on this, please do as soon as possible.

Akvo’s Isha told us how they are working closely with organisations that collect substantial amount of district-level data about water/sanitation infrastructure. To collect data in the field, they use/promote a self-developed Android-based software called FLOW.

Vivek also brought up data.gov.in and how he found some of their data to be incomplete. Satyakam suggested we work closely with the data.gov.in team in order to get the data we want. Developers in attendance suggested we ask the portal for a webservices API. Surenderan described how they do this at Change.org. Google’s Raman Jit joined us close to wrap up and offered to help with sourcing any data we need from various government stakeholders.

We still have to decide a date and venue for the June meet.

Participants

Guneet Narula, Sputznik

Isha Parihar, Akvo

Ashim Kapoor

Nasr ul Hadi

Vivek Khurana, Mintango Technologies

Surendran Balachandran, Change.org

Gaurav, Film Maker

Gora Mohanty

Irshad Reyaz, Landshark Labs

Rahul

Raman Jit Singh Chima, Google India

Satyakam Goswami, Consultant

DataMeet-Up Report, Delhi

Notes from DataMeet-Up in Delhi, 12 April 2013

April 16, 2013 Sumandro 2 Comments

Last Friday (12th April), a DataMeet-Up was organised in Sarai-CSDS, New Delhi.

There has been talk for a meeting like this in Delhi for long now. A few hackathons and data-related events have been organised in the past (see this and this). With the recently concluded Open Data Camp in Bangalore and the substantial buzz (at least in Delhi) created by the 12th Plan Hackathon earlier this month, the timing of this DataMeet-Up was quite apt to take a step back, focus on big picture issues, and keep building the open data community in Delhi.

The meet-up began with a round of introductions, and a brief discussion about the DataMeet group and its (Bangalore-centric) history.

This was followed by a discussion of the 12th Plan Hackathon experience. The presence of members of prize winning teams (from the IIT Delhi venue) and representatives of the NDSAP-PMU (NDSAP Project Management Unit) team energised the conversation. The hackathon was found to be a positive experience overall. It especially succeeded to create a set of initiatives to address the difficult task of making planning documents, statistical evidence and proposals for allocation more accessible to (at least some of) the people of the country. The demanding nature of the datasets and documents made available for the hackathon (in terms of the required background understanding of the themes) perhaps led to a number of submissions to engage with issues not directly associated to the 12th Plan.

From the experiences of the hackathon, we quickly moved to talk about ‘what’s next?’. Responding to the demand for a public API to access the datasets hosted at the data portal, Subhransu and Varun from NIC talked about the ongoing efforts to develop the second version of the OGPL software that the data portal uses. Unlike the existing version, the second version will not host uploaded datasets as individual files but as structured/linked data (following RDF specifications). While this was of great interest to some of the participants in the meet-up, it was not immediately clear to everybody why this shift (from separate data files to structured/linked data) is such a big deal. So we spent some time discussing semantic web, linked data and API.

Another thread of ‘what’s next?’ discussion explored official and un-official processes for requesting government agencies to publish specific datasets on the data portal, and also on how (and whether) non-government agencies can share cleaned-up government datasets through the data portal. We talked about approaching Data Controllers for the agencies concerned, endorsing each other’s data requests to drive community-based demands, and also the possibility of an alternative portal (perhaps using OGPL itself) to share governmental datasets cleaned up and reformatted by individuals and organisations. The group also noted and celebrated the initiative from NIC to use GitHub for storing and sharing code and data used in visualisations, data cleaning processes and data-based applications.

The challenge of unclear licensing of data, both in case of bought data products (such as Census of India and Nation Sample Survey) and publicly available datasets, was flagged but was not discussed fully.

Next, we had a round of inputs from the participants on what kind of data they have been collecting and working with, and using what softwares.

Akvo is working closely with organisation(s) that collect substantial amount of district-level data in certain focus regions, including water infrastructure, quality and usage data. For field level data collection, they use/promote a self-developed Android-based software called FLOW. They mentioned that absence of good quality basemaps (either vector or satellite imagery-based), for the areas they are working in, makes environmental data collection rather difficult. The changed Google Map API terms of use is forcing them to consider other options and move off Google (Satellite) Maps. It was suggested that they should try using imagery from Bhuvan. They also expressed interest in using data from and contributing to the OpenStreetMap project.

Accountability Initiative, located in the Centre for Policy Research in New Delhi, collects, digitises, cleans up, uses, analyses and archives substantial volume of national and state budget data and utilisation reports. They, however, cannot share the (digitised and cleaned up) data publicly due to ambiguous and missing licence agreements. They are producing large volume of data analysis but relatively lesser amount of visualisations. They mostly use Microsoft Excel and Stata for their data operations. Picking up the thread from Vibhu (of Accountability Initiative), Subhransu (of NIC) talked about the data cleaning challenges NIC is facing while working with various government agencies to open up their respective datasets.

Ravi and Pratap talked about the data usage situation in the journalism world. They mentioned that most journalists prefer accessing government data in hard (printed) copies, as that is seen as a permanent, easily archivable, and easily accessible (without knowing programming and data-wrangling skills) format. RTI remains one of the backbones of investigative journalism, and almost the entire volume of government data obtained through RTI gets stored in printed format (all over the journalists’ offices). The barrier of programming skills is the most important factor keeping Indian journalists away from more explorative and in-depth usage of government data.

In his quick update on the students’ scene in Delhi, Parin told us that there is little excitement around data analysis, management and visualisation. The group found this troubling. In a later discussion, maybe we can talk about it more and develop plans for engaging students to work with government (and non-government) data.

We briefly discussed GapMinder and Ushahidi, and data visualisation work by the teams at the New York Times and the Guardian. These examples are well-known but how to recreate them (in a different context) is often not very clear.

At the end we went back to a question regarding the quality of data published in the data.gov.in portal that was raised earlier in the community interaction session of the NDSAP workshop held on 4th April 2013. We were informed by Subhransu and Varun that the data shared on the portal goes through a three-stage quality-checking procedure — (1) first, the data to be shared is put together and rechecked by the Data Creators (who are headed by a Data Controller) in the government agency concerned, (2) the Data Controller of the agency undertakes the second stage of quality checking, and (3) finally the data is shared with the NDSAP-PMU team at NIC, who rechecks the data before approving its uploading to the portal. If required, the NIC team asks the agency to share the raw data for comparing with the (formatted) shared data. Vibhu raised a crucial question about how dependable and representative are such ‘raw data’ collected by the central government agencies.

As the questions were getting tougher and the evening older, we concluded the meeting. The next meeting will be sometime in mid-May. exact date and venue is to be decided.

Participants:

Guneet Narula, Sputznik

Amitangshu Acharya , Akvo

Isha Parihar, Akvo

Ravi Bajpai, Indian Express

Vibhu Tewary, Accountability Initiative

Pratap Vikram Singh, Governance Now

Shashank Srinivasan, Independent

Subhransu, NIC

Varun, NIC

Parin Sharma, Independent

Sumandro Chattapadhyay, Sarai-CSDS