Category Archives: DataMeet-Up Report

Delhi Jal Board and Open Water Data: Report from DataMeet-Up

A DataMeet-Up was held on Thursday, April 30, 2015, at the Akvo office to discuss the Summer Action Plan prepared by the Delhi Jal Board (DJB) and the data concerns thereof. Sundeep Narwani of Delhi Dialogue Commission presented the Action Plan. Kapil Mishra, MLA and Vice-Chairman of DJB, participated in the discussions and described the planned activities at DJB.

Here are the minutes of the meeting, prepared by Sandeep Mertia.

Sundeep Narwani, started with a presentation on Delhi Jal Board’s Summer Action Plan (SAP) 2015. Some important point from his presentation were:

  • SAP is a short term measure for three months of summer
  • Big problem: 40% of Delhi does not have enough piped water networks, and thus tanker services and unauthorised supply exist
  • They have planned several measures for improving the systems of – tube wells, infrastructure (replacing old lines) and repairs, grievance redressal and sewage treatment.

The details are available in the Summer Action Plan document.

This was followed by a long discussion session on several issues and concerns – related to water problems in Delhi. Some of the important questions, comments and suggestions were:

  • What is the authenticity of data which DJB has?
    Answer: Doubtful.
  • What’s the organisational structure of DJB, and its relationship with the MCD?
    Answer: DJB is a state body, independent from MCD
  • There is no data on bulk supply to colonies
  • Very little end user data. Lack of meter reading and averaged bills are part of the reasons for this problem
  • There is no way to interconnect supply between localities
  • No data on quality of water
  • Dr. Rajinder Kaur spoke about using existing spatial data maps of NCT/NCR, and the research conducted by the Indian Agricultural Research Institute on using spatial data for classifying ground water depth and quality
  • Dr. Renu Khosla, from Centre for Urban and Regional Excellence spoke about using GIS data for slums
  • Mr. Kapil Mishra, the Vice-Chairman of the DJB spoke at length about how they plan to transform the DJB. Also, he promised all data sharing from DJB’s side.

After a general discussion on various water related issues in Delhi, in the last segment we focused on framing the data problems associated with SAP 2015.

  • Need to think about the water data which already exists.
  • Sundeep will put up a list of people and organisations which have data on water in Delhi
  • A suggestion was made to focus on Gram Sabha level data as well
  • Need to prioritize the issue of water access to all, the missing data on ‘access’ related problems and appropriate mechanisms
  • Some private bodies have been collecting data from GPRS meters, let’s try to open this data
  • Need to map the borewells
  • Need to interpret and understand the data which DJB requires.

To Do list

  • We will list out all data sets that have informed the SAP document [Time: 2 weeks]
  • Sundeep will share the data sets already available from DJB
  • Once the list of data sets is prepared by us, it will be submitted to DJB via Sundeep and Kapil, who will then see if the mentioned data sets can be opened up. [Approximate time: 1 month]
  • Once these data sets are available, we will evaluate the quality of these data sets
  • Identify data sets that are missing and the ones that require a better collection process

Resources

We are using a Google spreadsheet to list out all data sets that informed the Summer Action Plan.

We are using HackPad to collect various resources.

Images of notes takes by Namrata Mehta and Sumandro Chattapadhyay at the meeting:

DataMeet_2015.04.30_DJB_Water_Data_Notes_01

DataMeet_2015.04.30_DJB_Water_Data_Notes_02

DataMeet_2015.04.30_DJB_Water_Data_Notes_03

DataMeet_2015.04.30_DJB_Water_Data_Notes_04

Notes from DataMeet-Up in Delhi, 31 July 2014

After a long hiatus, we had a DataMeet-Up in Delhi on Friday, July 31. Thanks to the Centre for Internet and Society for hosting us.

The meet-up had a small but very productive mix of old and new faces. Here is the list of participants:

* Deeptanshu
* Guneet Narula, Sputznik
* Isha Parihar, Akvo Foundation
* Namrata Mehta, Center for Knowledge Societies
* Praachi Misra, Competition Commission of India
* Rajat Das, Contify
* Riju / Sumandro Chattapadhyay, ajantriks.net
* Rohith Jyotish, Centre for Budget and Governance Accountability
* Shobha SV, Breakthrough

We started with a round of ‘what is DataMeet’ and moved into ‘what should DataMeet do in Delhi.’ Here are the suggestions that came up in the meeting:

1. Data Liberartion Strategy: We can work towards creating a strategy and workflow to undertake data liberation tasks. These tasks can focus on two types of data – (1) data that is not available in public yet and needs to be brought out by requesting the authorities concerned and/or speaking to them about it, and (2) data that is available in public but not in an open / directly-usable / machine-readable manner. We of course have done some work towards especially the second type of data, such as with MP constituency boundaries shapefile and with scraping of weather data. It will be useful to prepare and document strategies for such tasks.

Deeptanshu suggested that an important available-but-not-machine-readable data that we can work with in near future is the proceedings of the parliament published in the parliament’s website. We can possibly speak to ADR and PRS if they have done any work towards converting that data to machine-readable formats.

2. Learning and Sharing: We felt that DataMeet should undertake pedagogic functions – from internal training / sharing sessions within the DataMeet members, to public workshops for data and visualisation tools and techniques, to online documentation of the same. It seems that the existing (regular or otherwise) members of Delhi chapter of DataMeet is a good mix of those who look forward to pick up data / visualisation / programming skills and those who can offer to teach that. Often the latter group looks forward to learn about available datasets, ways of interpreting government data (from NSSO to budget sheets), and legal considerations associated with data — all of this the former group (who wants to learn data / visuaisation / programming skills) can offer to help with. Hence it make a lot of sense to convert our monthly meet-ups into short learning and sharing sessions.

Further, we can document the learning and sharing taking place in the meet-ups and put it up as online references. This will slowly create a knowledge base, with contributions from across the city chapters. There was a short discussion if we should use a Wiki to create such a knowledge base or a WordPress blog. The programming group is more comfortable with the former, while the non-programming group is more comfortable with the latter. With WordPress providing detailed ‘edit history,’ I guess it is alright to use WordPress for the sake of general ease of use.

Let us start the documentation over the next 3-4 meet-ups and think of what is the best way to upload it – either as a section of DataMeet blog / wiki / github or a sub-site.

3. DataMeet-Ups as Tiny Hackathons: It was suggested that on each DataMeet-Up, we take up a particular task — either of data liberation or of data visualisation — and focus on a particular topic and dataset, and spend time together working on the task. This will include thinking about the task, creating a workflow, sharing the skills concerned, and doing the task. And finally we showcase the work done through the DataMeet blog and elsewhere.

Further, this will also produce visible evidence of the government data made available at the portal being actually used, and thus to raise awareness of the available data and its demand.

4. Legal and Policy Discussion: It was briefly mentioned that some members of the group often face questions related to legal and policy context of open government data, and also regarding opening of non-governmental data. We should look for resource persons and organisations to advise on such issues. The DataMeet mailing list can also function as a primary discussion space for these topics. However, the mailing list can be too public a space for certain discussions.

Open Data Camp Delhi 2014

We had an initial chat about organising the Open Data Camp in Delhi in November 2014. The date and venue discussion is pending. We will take that up in the next DataMeet-Up.

The two primary objectives of the Open Data Camp Delhi are (1) a social and networking event for open data people (who are talking about and/or working with open data ) in Delhi, and (2) learn about their interests and challenges and prepare the road plan for Delhi chapter of DataMeet. Clearly, the first objective is more community-facing, and the second one is DataMeet-facing.

Here is the draft agenda for the Open Data Camp Delhi:

09:30-10:00 Ice-Breaker
10:00-10:30 Open Data and DataMeet [What is open data? What is DataMeet? Why is DataMeet? Why is open data relevant?]
10:30-11:30 Lightning Talks #1 [6 talks of 8 minutes each]
11:30-12:00 Tea/Coffee
12:00-13:00 Lightning Talks #2 [6 talks of 8 minutes each]
13:00-14:00 Lunch
14:00-16:00 Open Data Matchmaking Session [We set up two boards at the beginning of the day. One for writing down what data project one has in mind and what skills are required, and the other for writing down what data skills one can offer. On the basis of this, people meet up during the matchmaking session and talk about their plans.]
16:00-17:00 Closing and Thanks followed by Tea/Coffee
17:00-18:00 DataMeet Roadmap Discussion [Open to anyone who wants to participate]

It was suggested that lightning talks should be chosen as a combination of directly selected (by organisers) and community selected (through a submission and voting mechanism) modes.

DataMeet-Up in August 2014

We planned the next Delhi DataMeet-Up to take place on Wednesday, August 27, afternoon, where we will work on visualising datasets related to budget 2014. Rohith from CBGA, and his colleagues, will help us select the datasets and interpret them.

The venue is yet to be decided. Possible options are Akvo, CKS, Sarai, and Youth Ki Awaaz. Maybe CBGA can host it too.

Further, this also works as a warm-up session towards the Hack the Budget event being organised by World Bank in September.

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Notes from DataMeet-Up in Delhi, 22 November 2013

We had DataMeet-Up on Friday, November 22, 2013, at the Akvo office in Yususf Sarai Community Centre, Delhi.

Here are the notes from the meet-up [additional information in square brackets]:

Election Data Hackathon

  • We will undertake a collaborative mapping of datasets relevant for election data hackathon, using GitHub and Google Drive. More details about this below.
  • Datasets that we are trying to locate include: election results data (total vote count, vote count per party/candidate, etc), total utilisation and composition of utilisation of MP Local Area Development funds, parliamentary activities of MPs (presence/absence, questions asked, bill discussed, committees joined. etc), crime data corresponding to constituencies, etc.
  • We will identify organisations who might hold additional relevant data, such as PRS Legislative Research, Association for Democratic Reforms (and MyNeta.info), Gramener, and Hindustan Times [Anika used to work at HT].
  • Two caveats: (1) we may not get unique and standard identifiers across datasets, and (2) calculations may get difficult in case of by-elections [Lok Sabha Secretariat will have details of all by-elections, which can be accessed through RTI request].

Hack for Change on Women’s Rights

  • Shobha, Breakthrough.tv, led the discussion on the planned Hack for Change event being organised by Breakthrough and Hacks/Hackers, as part of the 16 days of activism against violence against women.
  • The hackathon is organised around urban safety data from Whypoll , multimedia evidences of early marriage practices in Bihar and Jharkhand gathered by Gramvaani , etc. It will also include a Wikipedia Edit-athon facilitated by Noopur Raval.
  • There were multi-directional discussions around other datasets of relevance for the hack event, which I have not kept track of very well. Overall, there were discussions around datasets available from , those published by National Crime Records Bureau, FIR and call database of Delhi police (and how to access that), and data on violence against women gathered by Tata Institute of Social Sciences from police stations across seven states.

Presentation on iPython

  • Konark Modi presented a detailed introduction to using iPython to undertake data cleaning in a very organised manner, as well collaboration features/workflow of iPython.
  • There emerged a demand for a tutorial on OpenRefine (previously Google Refine), which will be organised in a later meeting.

Mapping Indian Election Data

  • We will start documenting publicly available datasets relevant for studying past General Assembly (Lok Sabha) elections in India and the activities of the elected members at present. One can contribute to this mapping exercise in two ways, as mentioned below.
  • GitHub: We have created a repository for this data mapping exercise under the DataMeet organisation at GitHub. The organisation page can be accessed here, and the (india-election-data) repository can be accessed here. In the repository, I have created a draft format for documenting the identified datasets. This draft format can be accessed here. Please feel free to suggest changes to the draft format by opening an issue.
  • To document a dataset, use the format given in the repository, fill up the details, and rename the file according to the dataset’s name, such as “election-results-delhi-1995.md”. Then if you notice any requirement of data cleaning/reorganisation or lack of clarity regarding the dataset, open an issue (where the name of the dataset is mentioned) to note that task.
  • Google Drive spreadsheet: Alternatively, you can access this spreadsheet on Google Drive and add the relevant information about the dataset documented by you.

Please comment here or post to the DataMeet mailing list for any clarifications and suggestions.

Notes from DataMeet-Up in Delhi, 31 May 2013

Authors: Satyakam Goswami & Nasr ul Hadi

Venue

Akvo Foundation, Yusuf Sarai, New Delhi

Discussion

This was our second meet-up in Delhi. We began with a round of introductions, a brief discussion about DataMeet’s (Bangalore-centric) history.

Data Journalism

Nasr shared his experiences trying to apply data science to journalism and how, in most cases, the data was either incomplete, not specific enough or simply inaccessible. One of his projects involved patrolling with the Delhi Police and tracking the number of accident victims they transported to the Jai Prakash Narayan Trauma Center. Out of curiosity, Satyakam asked him more about how the trauma centre worked, because they have been using a FOSS SMS service to reduce clogging of their registration/reception counters.

Nasr also discussed the work of organisations like Human Rights Law Network and the immense amount of data they have on instances of abuse, child incarceration, etc in Indian prisons. Unfortunately, these aren’t digital. Guneet gave a similar description of police reform work done by CHRI.

Collaborations with Other Groups

We also discussed opportunities to collaborate with other meetups that shared our interest in data science. Hack/Hackers’ Delhi chapter conducted a news apps hackathon in February. Their next event will focus on data journalism, probably along the lines of the recent Editors’ Lab. We still need to discuss how to:

  1. educate Journalist on the importance of data driven journalism, and
  2. enable them with the tools and possible collaborations with people in this group.

We are also open to other such collaborations, but couldn’t think of any other community with a similar interest. If you know of any, please do let us know.

Elections 2014 and Public Infrastructure

Everyone was interested in building apps around then upcoming Delhi Elections and then the General Elections in 2014. Anyone interested in taking the lead on this, please do as soon as possible.

Akvo’s Isha told us how they are working closely with organisations that collect substantial amount of district-level data about water/sanitation infrastructure. To collect data in the field, they use/promote a self-developed Android-based software called FLOW.

Vivek also brought up data.gov.in and how he found some of their data to be incomplete. Satyakam suggested we work closely with the data.gov.in team in order to get the data we want. Developers in attendance suggested we ask the portal for a webservices API. Surenderan described how they do this at Change.org. Google’s Raman Jit joined us close to wrap up and offered to help with sourcing any data we need from various government stakeholders.

We still have to decide a date and venue for the June meet.

Participants

Guneet Narula, Sputznik

Isha Parihar, Akvo

Ashim Kapoor

Nasr ul Hadi

Vivek Khurana, Mintango Technologies

Surendran Balachandran, Change.org

Gaurav, Film Maker

Gora Mohanty

Irshad Reyaz, Landshark Labs

Rahul

Raman Jit Singh Chima, Google India

Satyakam Goswami, Consultant

Notes from DataMeet-Up in Delhi, 12 April 2013

Last Friday (12th April), a DataMeet-Up was organised in Sarai-CSDS, New Delhi.

There has been talk for a meeting like this in Delhi for long now. A few hackathons and data-related events have been organised in the past (see this and this). With the recently concluded Open Data Camp in Bangalore and the substantial buzz (at least in Delhi) created by the 12th Plan Hackathon earlier this month, the timing of this DataMeet-Up was quite apt to take a step back, focus on big picture issues, and keep building the open data community in Delhi.

The meet-up began with a round of introductions, and a brief discussion about the DataMeet group and its (Bangalore-centric) history.

This was followed by a discussion of the 12th Plan Hackathon experience. The presence of members of prize winning teams (from the IIT Delhi venue) and representatives of the NDSAP-PMU (NDSAP Project Management Unit) team energised the conversation. The hackathon was found to be a positive experience overall. It especially succeeded to create a set of initiatives to address the difficult task of making planning documents, statistical evidence and proposals for allocation more accessible to (at least some of) the people of the country. The demanding nature of the datasets and documents made available for the hackathon (in terms of the required background understanding of the themes) perhaps led to a number of submissions to engage with issues not directly associated to the 12th Plan.

From the experiences of the hackathon, we quickly moved to talk about ‘what’s next?’. Responding to the demand for a public API to access the datasets hosted at the data portal, Subhransu and Varun from NIC talked about the ongoing efforts to develop the second version of the OGPL software that the data portal uses. Unlike the existing version, the second version will not host uploaded datasets as individual files but as structured/linked data (following RDF specifications). While this was of great interest to some of the participants in the meet-up, it was not immediately clear to everybody why this shift (from separate data files to structured/linked data) is such a big deal. So we spent some time discussing semantic web, linked data and API.

Another thread of ‘what’s next?’ discussion explored official and un-official processes for requesting government agencies to publish specific datasets on the data portal, and also on how (and whether) non-government agencies can share cleaned-up government datasets through the data portal. We talked about approaching Data Controllers for the agencies concerned, endorsing each other’s data requests to drive community-based demands, and also the possibility of an alternative portal (perhaps using OGPL itself) to share governmental datasets cleaned up and reformatted by individuals and organisations. The group also noted and celebrated the initiative from NIC to use GitHub for storing and sharing code and data used in visualisations, data cleaning processes and data-based applications.

The challenge of unclear licensing of data, both in case of bought data products (such as Census of India and Nation Sample Survey) and publicly available datasets, was flagged but was not discussed fully.

Next, we had a round of inputs from the participants on what kind of data they have been collecting and working with, and using what softwares.

Akvo is working closely with organisation(s) that collect substantial amount of district-level data in certain focus regions, including water infrastructure, quality and usage data. For field level data collection, they use/promote a self-developed Android-based software called FLOW. They mentioned that absence of good quality basemaps (either vector or satellite imagery-based), for the areas they are working in, makes environmental data collection rather difficult. The changed Google Map API terms of use is forcing them to consider other options and move off Google (Satellite) Maps. It was suggested that they should try using imagery from Bhuvan. They also expressed interest in using data from and contributing to the OpenStreetMap project.

Accountability Initiative, located in the Centre for Policy Research in New Delhi, collects, digitises, cleans up, uses, analyses and archives substantial volume of national and state budget data and utilisation reports. They, however, cannot share the (digitised and cleaned up) data publicly due to ambiguous and missing licence agreements. They are producing large volume of data analysis but relatively lesser amount of visualisations. They mostly use Microsoft Excel and Stata for their data operations. Picking up the thread from Vibhu (of Accountability Initiative), Subhransu (of NIC) talked about the data cleaning challenges NIC is facing while working with various government agencies to open up their respective datasets.

Ravi and Pratap talked about the data usage situation in the journalism world. They mentioned that most journalists prefer accessing government data in hard (printed) copies, as that is seen as a permanent, easily archivable, and easily accessible (without knowing programming and data-wrangling skills) format. RTI remains one of the backbones of investigative journalism, and almost the entire volume of government data obtained through RTI gets stored in printed format (all over the journalists’ offices). The barrier of programming skills is the most important factor keeping Indian journalists away from more explorative and in-depth usage of government data.

In his quick update on the students’ scene in Delhi, Parin told us that there is little excitement around data analysis, management and visualisation. The group found this troubling. In a later discussion, maybe we can talk about it more and develop plans for engaging students to work with government (and non-government) data.

We briefly discussed GapMinder and Ushahidi, and data visualisation work by the teams at the New York Times and the Guardian. These examples are well-known but how to recreate them (in a different context) is often not very clear.

At the end we went back to a question regarding the quality of data published in the data.gov.in portal that was raised earlier in the community interaction session of the NDSAP workshop held on 4th April 2013. We were informed by Subhransu and Varun that the data shared on the portal goes through a three-stage quality-checking procedure — (1) first, the data to be shared is put together and rechecked by the Data Creators (who are headed by a Data Controller) in the government agency concerned, (2) the Data Controller of the agency undertakes the second stage of quality checking, and (3) finally the data is shared with the NDSAP-PMU team at NIC, who rechecks the data before approving its uploading to the portal. If required, the NIC team asks the agency to share the raw data for comparing with the (formatted) shared data. Vibhu raised a crucial question about how dependable and representative are such ‘raw data’ collected by the central government agencies.

As the questions were getting tougher and the evening older, we concluded the meeting. The next meeting will be sometime in mid-May. exact date and venue is to be decided.

Participants:

Guneet Narula, Sputznik

Amitangshu Acharya , Akvo

Isha Parihar, Akvo

Ravi Bajpai, Indian Express

Vibhu Tewary, Accountability Initiative

Pratap Vikram Singh, Governance Now

Shashank Srinivasan, Independent

Subhransu, NIC

Varun, NIC

Parin Sharma, Independent

Sumandro Chattapadhyay, Sarai-CSDS