Category Archives: Uncategorized

Data Diaries: What I learned

As some of you might know I’ve recently moved back to the US and after taking a break, I wanted to share some of my thoughts on the past 7 years of Open Data in India. These are just some of the big lessons I’ve learned and observations that I think are important.

Data needs advocates from every sector

Historically the biggest voices that government hears about data are corporations selling products or statisticians being gatekeepers. Now that data is a part of everybody’s life in ways that are unseen, data literacy is necessary for everyone and data needs advocates from every walk of life. What I experienced with DataMeet was that broad data ideas with inputs from experts from all sectors can be very powerful. When you advocate for the data itself and how it needs to be accessible for everyone you can give solutions and perspectives that statisticians and for profit companies can’t. Ideas that are new because they are in the best interest of the whole.  That’s why we are invited to the table because even though it doesn’t make political or economic sense (yet) to listen to us, it is a different perspective that is helpful to know.

This is why every sector, education, environment, journalists, all actors have to integrate a data advocacy component to their work.  Issues of collection, management, and access affect your work and when you go to talk to governments about the issues you want to improve, creating better data and making it easier to get should automatically be apart of it. The idea of “I got the data I need so I’m good” does not make the next time you need data, or being upset with the quality of data  being used to create policy, easier to deal with.

Building ecosystems are more important than projects

In 2011 when I started to work on water data, it became clear that there was no techie/data ecosystem for non profits to tap into for advice and talent. There were individuals but no larger culture of tech/data for public good. This hadn’t been the case in the US so when I was at India Water Portal I wanted to spend time to find it because it’s really important for success. I was basically told by several people that it wasn’t possible in India. That people don’t really volunteer or share in the way the west does. It will be difficult to achieve.

With open data growing quickly into an international fad with lots of funding from places like Open Gov Partnership and Omidyar, I knew open data projects were going to happen. But they would be in silos and they would largely not be successful. Creating a culture that asks and demands for data and then has the means to use it is not something that is created from funded projects. It comes from connecting people who have the  same issues and demonstrating the demand.

DataMeet’s largely been a successful community but not a great organization. This is my fault. A lot of my decisions were guided by those early issues. It was important to have a group of people demonstrating demand, need, and solutions who weren’t paid to be advocates but who were interested in the problem and found a safe space to try to work on it. That is how you change culture, that is why I meed people who say I believe in open data because of DataMeet. That would not have happened as much if we just did projects.

You can’t fundamentally improve governance by having access to data.

It is what we work toward as a movement but it just doesn’t really work that way- because bad governance is not caused by the lack of information or utilization of data. Accountability can’t happen without information or data; and good governance can’t happen without accountability. But all the work spent on getting the government to collect and better use data is often not useful. Mostly because of the lack of understanding of what is the root cause of the issue. I found that budget problems, under staffing, over stressed fire fighting, corruption, interest groups, and just plain apathy are more to blame then really the lack of information. This is something that civil society has to relearn all the time. Not to say data can’t help with these things, but if your plan is to give the government data and think it will solve a problem you are wasting time. Instead you should be using that data to create accountability structures that the government has to answer to. Or use that data to support already utilized accountability influences.

You gotta collect data

Funding that doesn’t include data collection, cleaning, processing costs is pointless. Data collection is expensive but necessary. In a context like India’s where it is clear that the government will not reach data collection levels that are necessary, you have to look at data collection as a required investment.  India’s large established civil society and social sector is one of its strongest assets and they collect tons of data but not consistently. A lot of projects I encountered were based on the western models of the data being there, even if not accessible, it is complete somewhere. NOPE. They count on the data existing and don’t bother to think about the problem of collection, clean up, processing, and distribution. You have to collect data and do it consistently it has to become integrated in your mission.

Data is a pretty good indicator of how big a gap exists between two people trying to communicate.

100% of every data related conversation goes like this “The data says this but I know from experience that…. ” Two people will have different values and communicating a value by saying “I think you should track xyz also, because its an important part of the story” can be a very productive way to work out differences. That is why open data methodology is so important. It also becomes a strong way for diverse interests to communicate and that is always a good thing.

Data is a common

In places that still don’t have the best infrastructure. Where institutions and official channels aren’t the most consistent. The best thing you can do is make information open and free. It will force issues out, create bigger incentives for solutions, and those solutions will be cheaper. Openness can be a substitute for money if there is an ecosystem to support the work.

You can collect lots of data but keeping it gets society no where.

A lot of people in India are wasting a lot of time doing the same thing over and over again. If I had 5 rupees for every person I spoke to who said they had already processed a shapefile that we just did, or had worked with some other dataset that is hard to clean up I could buy the Taj Mahal. Data issues in the country are decades old, but not sharing it causes stunting. Momentum is created from rapid information sharing and solutions; proprietary systems and data hoarding doesn’t. The common societal platforms that are making their way around India’s civil society and private company meeting rooms won’t do it either. You can’t design a locked in platform with every use in mind, its why generally non open portals have had such limited success. If you have solved a hard problem and make it open you save future generations from having to literally recreate the wheel you just made. How much more brainpower can you dedicate to the same problems? Let people be productive on new problems that haven’t been solved yet.

The data people in government are unsung heroes.

Whenever I met an actual worker at the NIC or BHUVAN or any of the data/tech departments they were very smart, very aware of the problems, and generally excited about the idea of DataMeet and that we could potentially help them solve a problem. It was not uncommon when being in a meeting with people from a government tech project for them to ask me to lobby another ministry to improve the data they have to process. While I wish I had that kind of influence it made me appreciate that the government is filled with people trying their best with the restrictions they have, but the government has “good bones” as they say and with better accountability could get to a better place.

I don’t think I covered everything but I’m very grateful for my time working on these issues in India. I feel like I was able to achieve something even though there is so much more to do. To meet all the people who are dedicated to solving hard problems with others and never giving up will inspire me for a long time.

 

 

Ethical Reporting of Data and Security Issues

Who is this document for  ? 

In this era where we have a mobile application  and a website for everyone even with more software engineers; data leaks or security issues are a common phenomenon. This document will hopefully help anyone who finds leaks report them ethically without causing too much harm. 

What is the need of this document ? 
In India recently we have seen a lot of leaks from government websites to JIO to Zomato to Medical Testing Data . More engineers are willing to share these leaks on twitter and more news organisations are covering them. But in most of these cases we have observed that care has not been taken to protect the user information and privacy. Hence we decided to write this document. 
Warning :  This is not a comprehensive guide  but this is what we think are best practices to follow at this time. Please make sure you talk to a lawyer in addition to this.

First things first

Determining the criticality of a Issue
 A data leak or a security issue could have varied levels of criticality depending on who it causes damage to. Some leaks could cause loss of revenue for a organisation (eg a food ordering service Food Panda in India lost a lot of revenue because of a bug). While other leaks could result in invasion of privacy and  end up affecting personal lives of people like the pathology clinic in Mumbai where the medical records of patients were leaked . Some leaks could have financial outcomes for the data subjects,  such as leaks involving financial information, passwords to accounts tied to transaction ability etc. 
The more the criticality of a leak the more caution you have to exercise in reporting either on social media or otherwise. So how does one determine the criticality ? Here are some simple questions to guide you to understand the criticality 
  1. Does this affect more than one person ?
  2. Does reporting this damage lives in a long term and is the damage irrevocable ?eg : The leak of health records of HIV Patients from a pathology lab or a hospital could affect the people involved by marking them for their entire life. This information coupled with social stigma could affect their prospects of a healthy life or their professional lives
  3. How long will it take for the organisation to close the leak ?
  4. In case there is no clear tangible impact, could this have an impact in combination with other public data?
Ideally  It is recommended that one does not publicize a leak without first informing the organisation affected and following up with them to close it. 
Consequences I should consider when i report a security issues
Reporting a leak in India is always a tricky situation. It could lead to criminal proceedings against you. We advice that you always get a legal advise before you report anything. Especially if the organisation has not defined a bug reporting program. The information technology act 2000 for eg is one of the laws which defines what is considered a crime with respect to action online.
For eg : Some of the sections particularly section 43 of the act defines it be a crime to gain unintentional access or even download or damage a system. 
Remember reporting a leak sometimes takes extensive  follow ups. It needs perseverance and patience. Sometimes reporting a leak requires you to gain skills which are not technology oriented eg:  networking to connect with decision maker who could help plug the issues or even writing skills to share the issue online. One has to be prepared to learn and get more support when not equipped with the right skills.  
We  highly recommend that you do a personal risk assessment before you move forward. 
Risk Assesment for a reporter 
Warning : This is only a guideline we recommend you talk to your lawyers and security professionals and your network support for a better assessment 
  1. Do I have support  by which we mean :
    • Financial to support me in case of consequences
    • Personal networks to support me
    • Legal Support Systems 
  2.  Who else is at risk if i am in trouble 
  3.  What are the chances that the organisation you are reporting against will act to prosecute you 
  4.  Do I have the mental ability / support to handle the stress for the period of time
  5.  Do I have the technological support to help me protect myself and my loved ones   
Calculating the risk of a vulnerability 
Security professionals around the world use some standard methodologies to calculate risk of a vulnerability. One such tools which can help you calculate the risk is this : 
It is highly recommend that you do the risk assessment of a vulnerability in order to help you stratergise or even decide on continuing to work on it 

Effectively document your finding 

Documenting your story
When you chance upon a leak and confirm it by double checking. It is highly recommended to make elaborate notes on how you discovered the leak. This documentation should be done as soon as possible if possible right after you are confident of it as a leak. The reason being this is going to be something various stakeholder will be asking you repeatedly and having as many accurate details will help support your case. You can use secure note taking platforms like etherpad / riseup pad to protect your privacy if you making notes online. It is recommended that you store this offline in an encrypted format. Also take screenshots with time stamps ( this could be a double edged sword too if you are liable to prosecution because of the leak) 

Documenting the Bug

An elaborate documentation of the leak itself helps in getting it fixed faster. Engineers always find it useful to have more information. 

  • Include steps to show how to replicate your bug , talk about pre conditions eg : it could be accessing a particular page with a particular browser /  accessing it with a certain phone 
  • Include screenshots if possible. 

Effective process to share your findings 

Reaching out to concerned officials / Organisations

The right way to report a leak would be to reach out the organisation of the leak and write to them about your findings before you go public with a leak ( offcourse this has consequences we certainly dont recommend this for Snowden or Manning type leaks assessing you risk is very important before you do this ). This can be done in many ways

  • Bug Bounty Programs  : Most technology orgs have a bug bounty program and also sometimes offer rewards for reporting of leaks this is the easiest and the most rewarding way to reach out to orgs.
  • Organisation Public Issue trackers :  Some open organisation do not have a bug bounty programs but have public bug repositories this is either linked to their code repos or to their websites. This is another way to report any security leaks.  
  • Community Outreach Co-ordinators : In the absence of a bug bounty programs or Issue trackers  some organisation have Outreach Coordinators and they are developer and business liaisons for a organisation. For any critical leak it is highly advisable to talk to them to report bugs this will ensure the closure of leaks with minimal lead time   
  • Public / Private mail ids :  While some prefer anonymous reporting , sometimes personal reporting helps build confidence and obtains quick results. If you wish to stay anonymous it might be best to report to the public email ids available on the website. On the other hand if you have a certain degree of confidence on the intent of an organisation. It might be best to use your personal network to reach out to them and talk them through it 

In case of leaks of the government websites  : Every country has a specific process to report leaks on government websites. In India specifically CERT India is responsible for the security of government websites and it is best to report to them. The other organizations one can reach out to are  

  1. CERT India
  2. MEITY
  3. Government Body that it is affected

Talking  about the Security Issue  in Public

While it is very easy to talk about a security issue in public . It is also considered a honor by many and sometime a necessity to report. We recommend the following actions if you plan to do so 

As a general rule avoid talking about a leak or a issue before it has been fixed  

Initiating the removal of sensitive data

Make sure atleast the sensitive data is removed before you share a issue if you cant get the complete issue fixed. By Sensitive Data we mean Personally Identifiable Information. It is also recommended that you clean the data for secondary Identifiers ( these are identifiers when coupled with other information can still make data personally identifiable). If you are not sure of identifiers we recommend that you talk to organisations which have been working on Open Data for a while  ( below is a list of some organisations) 

Building a campaign

While sporadic tweeting or sharing help sometime . It is always recommended that you build a plan to talk about the leak. 

  •  Decide on your objectives for sharing the leak what would like to achieve . 
  • Identify people who are working in this field and could help you amplify your voice.
  • Talk in as much accuracy of the effects of the leak and its origin
  • It might be best to leave out details of reproduction of the leak if you think it could harm more people  

Using Screenshots 

Sometimes using screen shots not amplifies your report and its impact. We recommend using it as opposed to share the methodology of replication in public. Though one has to be cautious while sharing screenshots make sure you block any personally Identifiable information or information that could cause damage to lives or property.

Talking to Press

Before talking to the press please be clear of your intentions to do so. Again we recommend this only after the issues have been fixed. But sometimes it is important that you talk in your help to close the issues and we understand that. So we have put together a set of things that would make this conversation ethical and effective

Disclosure  :

  • Make sure to disclose you intent of reporting leaks
  • Disclose any funding you have received to do this work 
  • Avoid sharing in detailed description of the leak in case it has not been closed yet
  • Make sure to not share sensitive data either through your screenshots or through data 

 

List of organizations to ask for support

Online Security and Whistle Blower Protection

Open Data Questions 

Legal Support ( India ) 

ALF  

Research Methodologies 

Security Methodology and Advice

Further Reading

This guide was written by Chinmayi S K with contributions and feedback from Thejesh GN , Chris Kubeca , Amber Sinha and Nisha Thompson

This post is released under the Creative Commons Share Alike 4.0 License

If you have any feedback or  comments please feel free to write to us through this contact form and we will get back to you asap 

Field Papers: How To

At the Indiranagar Data Party! Garbage Go! they had a few people who didn’t want to use technology to map garbage so Maanya and Aarthy printed out Field Papers for mapping. These worked really well and allowed for a more inclusive event.

Maanya from Mapbox made a how to for using Field Papers.

field-paper

Step 1: Click on Make

field-paper-1

Step 2: Go to the area you want to map and select with the rectangle.

field-paper-2

Step 3: Download and print.

They will look like this and you can give them to people to map along the way.

20160929_104244

Data Party! Garbage Go!

At DataMeet we have spent years looking for and trying to make data accessible. The last few years more and more data is being made public which we are excited about however people demand data that fills the gaps in data that already exists or that is more actionable. Data that people want and need isn’t being produced, and if it is being produced it isn’t being shared.

This is the most true in urban spaces where there are tons of projects dedicated to collecting data for the city but none of this data enters the public domain as open data. It isn’t public data because the government doesn’t collect it and the various governance and civic oriented groups who collect the data are more prone to write reports or put the analyzed data up online and not the usable and complete raw data.

So DataMeet along with Oorvani Foundation and Mapunity want to start a monthly Data Party! Where we pick a topic and try to collect as much data as we can over a month. Then we will make the data open for download on OpenCity Urban data portal and also send it to the appropriate person in the government, as well as, write data stories on Citizen Matters.

So please join us on Sept 24th to kick off the first ever Data Party! Garbage Go! 

There are an estimated 9000 garbage blackspots in Bengaluru. We are trying to catch them all!

Sign up to map your neighborhood everyday. Or join us for chai and snacks on Sept 24th and map with friends in 3 locations: Koramangala, Indiranagar or Frazertown.

You have to register and download the app so we can plan for the snacks.

Event location will be sent to you once you register.

Time is 9:30am to 12:30am – Sept 24th Saturday morning.

9:30am – Intro and app explanation
10 to 12 – Mapping
12 to 12:30 – Closing and Next Steps.

All data collected will be made open on the OpenCity.in Urban Data Portal for download and use, and this data will be sent to the BBMP and followed up on.

Indirangar – Maanya – Meeting place MapBox India

Koramangala – Nitin – Meeting place Sagar Fast Foods behind BDA complex

Frazertown – Contact Nisha Thompson – Meeting place French Loaf by Richards Park.

Register here.

Download the app and get mapping.

Link to Mapunity Groups IOS app:
Link to Mapunity Groups Android app.
SeeRead more 

Sikkim Data Portal and Sensitive Information

Sikkim was the first state to come up with its own Sikkim Open Data Acquisition and Accessibility Policy (SODAAP) on the lines of National Data Sharing and Accessibility Policy (NDSAP).  Continuing to lead Sikkim is now officially the first state to have its own data portal we are really happy to see this development and hope more states follow.  DataMeet has been carrying consultations with officials of Sikkim in framing the policy and helping them with workshops and insights to use the data. Honorable Member of Parliament Dr. Prem Das Rai has also been our keynote speaker during the Open Data Camp 2015 at Delhi sharing experiences about the on-going work in Sikkim.

As emails were being pushed about the launch of the portal on 15th July, we were alerted about sensitive data being published through the data portal by Abhay Rana. Two datasets on the portal had sensitive information like 1) name, 2) religion, 3) caste, 4) father’s name, 5) mother’s name, 6) gender, 7) birth date, 8) residential address, and 9) information regarding disabilities (if any) of school children, teachers with additional detail of marital status for the teachers.  We alerted both NIC and the chief data officer in charge for the datasets to get them taken down immediately.  Open data does not promote any sensitive information being shared publicly and it violates the very core principles. We applaud the quick response by the data controller in response.

It was an unfortunate accident that sensitive information not to be published under the policy was shared through the data portal. NDSAP along with SODAAP has mandates for every department to make sure sensitive information has restricted access and is not to be published. This incident is not the first where we encountered sensitive information was being published by government officials. Most of the times such information is in the public domain by accident or due to lack of awareness among officials about type and parameters available under the datasets. More incidents like this can harm officials from publishing further data and is a threat to the ecosystem of open data.

As more and more data becomes part of the public domain it is important that we all can work together to ensure that we do not violate privacy or put up sensitive data. More guidelines and frameworks are needed to maintain and report sensitive data which is already public.

We request you to bring to our attention if any sensitive information is being published under the pretext of open data. For now explore the new data portal and use open data to bring positive change in your community.

12 DAYS TIL 2016 Bangalore Open Data Camp: Pollution Party!

DataMeet will be hosting the 5th Bangalore Open Data Camp: Pollution Party on May 14th and 15th.  This year we want to spend time and look at the growing problem of pollution by spending two days examining the role of data. Last year saw a major turning point in the debate around pollution. Indian cities became a major focal point, as proof that New Delhi has worse air quality than reigning champion Beijing was proven with data. This put a spotlight on air pollution problem across India. At the same time water pollution from industry has also come up in the foaming lakes and rap videos fighting for recognition of pollution and its effects on people. The economic and development growth has meant that the building industry has been in over drive bringing sand and dust into urban and peri urban areas in large quantities plus the growing lack of proper trash disposal has had major health implications for people from all social economic backgrounds.

However, the actual exposure of pathogens and pollution is not well known, extensive data has not been made available or is being collected in a way that can’t be easily understood or acted upon. This has spurred the rise of data collection networks and agencies to fill this gap. In every major city citizen supported cheap sensor devices have been put around cities to add data to the small number of official government monitoring stations.

This year at Open Data Camp we want to explore the role of these data collection network in a growing citizen and private sector monitoring role. What is the role of open data? When these networks grow can there be agreement on standards and formats to be maintained? and Are there financially sustainable solutions that can be built on open data?

Notably Karnataka State Pollution Control Board is attending to give the keynote in the morning and hopefully bring some data with them for us.

Tentative Agenda

1) Karnataka Pollution Control Board

2) Environmental Groups to give the general ecosystem around enforcement

3) Data collection networks
Sensors without Borders
IndiaSpend*
Hindustan Times*
YUKTIX – Open Weather Network Bangalore
India Open Data Association

4) Water Pollution
Ground water
Urban lakes

5) What you can do with robust data?
Urban planning
Transport
Modeling for enforcement.

6) Open Environmental Formats and Information Discussion

Day 2

We will be hosting a sensor workshop for kids http://odc.datameet.org/sensor_workshop

Sensor workshop poster

We’d like to thank our sponsors Google, Sensor without Boards, India Open Data Association, Oorvani Foundation, and partner Reap Benefit. If you would like to sponsor or get involved please contact me @ Nisha (at) Datameet.org

Meet a DMer: Dilip Damle

On the DataMeet list we have started referring to each other as DMers.  So I wanted to start highlighting people who are pretty interesting and have a great insights into open data.

Dilip has been a major contributor to the list for a few years. He is always sharing data, advise, and information. He has contributed to the pincode and shapefile conversations and it always a source of support.

Where are you from? What do you do? 

I am from India, born and studied in Goa. Presently (last 28 years) based in Delhi.

By qualification I am a Mechanical Engineer.
Presently I am a freelancer (worked as an employee between 1981 and 1992)  As a one man SOHO professional I provide services to different Private organisations for themselves and some  Private organisations in turn providing services to Government agencies. Area of specialisation is mainly application of computers to Engineering, CAD, Technical publications, Cartography, Data Maintenance,  MIS reports and custom software.

I am a part time hobby programmer and have been programming since 1983 for fun and to automate my own work– VB, VBA and Autolisp.

How did you find out about DataMeet? 

I wanted to make and publish editable version of Election maps and was looking for the source of updated maps after delimitation.  I bumped in to [Raphael] Susewind’s Blog and via that page came to know about Datameet.

Why are you interested in data?

Mainly to make editable maps in common software, which I have a plan to offer free. More recently I have been doing less work on CAD and more on databases. In the process I am also hooked to the beauty of clean data represented especially in Database as against Excel.

Do you believe in open data? and why?

Yes, At least the data that is relevant to society as a whole.

Reasons:

  1. Only open data can be that Single Truth. Otherwise multiple mismatching versions float around for commercial reasons.
  2. There are no unnecessary fights over wrong data.
    (The most classic example is the India’s Boundary map. In this world of computers we have not provided a “Correct” Boundary accessible to all in a digital format and and want to stop all “Incorrect” data freely available just by legislation and expecting everyone to hold a print in your hand and come to Dehradun for “approval”. It is ridiculous.)
  3. Let there be commercial exploitation by value addition like visualization, Web Access but raw data generated by agencies that run from taxpayer’s money should be available in the open. Except for security, military and personally identifiable data.

What do you hope to learn?

I hope to interact with varied people and know newer things and techniques  that I might not have even heard of before.

What is your impression of the DataMeet community?

Good people but It is too small, needs to be bigger.

What kind of civic projects do you work on? What kinds of civic projects are you interested in working on?

I have worked on Water Supply and  Sewer networks mainly the application of computers for several years. A little on Storm water.
In future I wold love to work on Transportation modeling.

Share a visualization that you saw recently that made a big impression?  Share an article you have read recently that made a big impression? (does not have to be data related)

Share a visualization that you saw recently that made a big impression? Share an article you have read recently that made a big impression? (does not have to be data related)

A visualisation about Evolution.Evo_large

Open Access Week 2015

Late post

Open A20151024_190330ccess Week is used as an opportunity to spread awareness of open access issues throughout the world. It was Oct 24th to the 30th last year. Shravan and Mahroof from the Ahmedabad Chapter suggested we do the first every multi city hangout and bring together different groups working on openness issues throughout the country.

For the event we had a Google Hangout with:

Data.Gov.In started us off with  Alka Misra and Sitansu participating from Delhi. They spoke about new features on Data.Gov.in, new datasets and visualizations available. They were also there to extend invites for more participation from the community.

Rahmanuddin from Access to Knowledge then spoke about Wikipedia and their community dedicated to local language knowledge sharing. They also had pertinent questions to Data.Gov.In regarding using open licenses. Since Wikipedia can’t use any data from Data.Gov.In since a license isn’t specified.

Ahmedabad Chapter went next. Ramya Bhatt, Assistant Municipal Commissioner from Ahmedabad, came and gave a brief talk about their plans for open data and smart cities. Alka from Data.Gov.In offered assistance. Then some students from Dhirubhai Ambani Institute of Information and Technology’s machine learning program used some data from Data.Gov.in to do analysis at the event. They looked at high budget allocation per state and drop out rates.

Open Access India’s Sridhar Gutam briefly went through the plans OAI has for the upcoming year to promote open access science and journals.

Hyderabad DataMeet is a new and yet to really take shape meet up but we were happy to see a first attempt. Sailendra took the lead as the organizer and brought together some people from IIM Hyderabad. Srinivas Kodali was there to talk about all the data he had made available that week.

 

20151024_184755Banalore DataMeet was there to share what has been going on with DataMeet and any new iniatives in Open Access

 

 

It was a great event, and as with all online events there were some technical difficulties but everyone was patient. It was awesome to see how the open culture space has grown, and to see so many new DataMeet chapters.

You can see the event below:

I hope we do one again soon minus the technical difficulties.

Global Open Data Index: Water Quality

Last year I helped assess the water quality section of the Global Open Data Index (GODI). Given the news of lead poisoning in Flint, Michigan and increasingly beyond, safe drinking water is no longer assured even in countries where it’s been guaranteed, so I am very glad they included it in GODI.

GODI is a survey of 122* countries that look at the status of ‘high priority datasets’ and whether they are truly open according to the Open Data Criteria. Water quality was included last year for the first time. So my job was to examine each country’s submission  and assess if the data submitted was what was asked for and met the criteria for being open. This was a daunting task but I figured if I could find water quality data in India of all places it wouldn’t be impossible.

Assessment Criteria/Methodology

GODI looked for very specific parameters:

While there are a lot more parameters that could be asked for, these were a good sample of parameters to assess if there is robust water monitoring in the country.

After the initial submission phase there were a lot questions about why wouldn’t the survey just ask for drinking water quality data or environmental monitoring data?

Choosing parameters instead of programmes is important because monitoring the environment and drinking water quality are connected. Some countries haven’t really established large nationalized water treatment strategies, drinking water comes directly from a natural resource so the environmental monitoring data inadvertently applies to the drinking water scenario.  Which means that if a country really has robust water quality data they must have these 5 parameters because they cover surface and ground water sources and also reflect safe drinking water standards.

The assessment would be rejected if a submitter only found the surface water body monitoring stations (environmental water monitoring) for instance because arsenic and fluoride are only found in groundwater. So the submitter would either ideally find the treated drinking water quality data which will cover all the parameters or the source water quality data for both surface and ground water.

For a full look at the methodology of the entire survey go here.

Some background

There is no one way to create water management systems but there are two major ways by which people get water – directly from the source or piped in from a source or a treatment facility. The origins of the water source is important. If you are getting water from the ground there are different quality issues  than from surface water (lake or river). If water is from a treatment plant there is a possibility that plant is getting water from both surface water, ground water, and in some cases recycled water. Usually water quality is measured at source and after treatment (treatment plants take multiple water quality samples during the treatment process.)

A full water quality assessment means lots of parameters and not all of them are tested the same way; some parameters take several days and require specific conditions, others can be taken easily through filters or litmus papers.  Water quality is a deliberate process of sampling and testing, and it not as easy as sticking a sensor into the water and monitor a continuous feed of data (although the potential for these approaches is quickly growing as technology improves.)

What I looked for

Since water quality was a scientific process I figured if I found any proof of water treatment or quality monitoring, a dataset would not be far off. After going through a few countries I noticed that the different water management approaches and policies affected where you would find the data.

Most countries give drinking water treatment responsibilities to local bodies but sometimes is monitored by central government under public health regulation so aggregated data could lie with the public health ministry or the environmental protection body.  In most cases responsibility for environmental monitoring fell to a central government Environmental Ministry.

So this scenario means that multiple datasets exist – a centralized dataset for surface and groundwater that  usually lies with the environmental ministry that could have all the parameters but sometimes doesn’t, or it doesn’t have real time data (this means data  may be available but from less frequent data collection such as quarterly or half yearly efforts). Or the Public Health Ministry has reports of water quality with all the parameters but these are aggregated, and usually in a report form (not a dataset) and not updated in a timely manner.

The US, for instance, falls under this group and can produce confusing submissions. The US has a robust geological survey of surface and ground water sources. However, the drinking water reports are supposed to go to the Environmental Protection Agency but no one seems to be updating the database with information. In my assessment I reduced the score because both are supposed to be available in the public domain.

There are countries like Belgium where water management and monitoring are completely left to the local body and there is no central role for monitoring at all, which meant there is no dataset.

There are countries where there is a strong central role in water management and a dataset could be made open like in France. Korea stood out, because they have live real time water quality information from their treatment plants that gets updated to a website.

Then there are the ‘unsures’: which are countries that seem to treat water to some degree or have national drinking water monitoring programmes but don’t have data online, reports or any mention of data at all. This is not restricted to the developing world. I was very frustrated with several European countries with newspaper articles riddled with reports of how pristine and delicious their water is that don’t have a single public facing dataset.

Take Aways

United Kingdom and the US, both pioneers of the open data movement had terrible water quality data for water treatment, and no effort has been made to bring the data together or make it available in a real time fashion.  Also it is not clear to citizens who holds local bodies accountable for not updating their reports, making reports public or finding ways to bring this data into the light so it can be usable. It is no wonder that the US is now on the cusp of a public health crisis.

It is frustrating that the open data movement hasn’t quite been able to reconcile decentralization and local responsibility with national level accountability and transparency. Public health is a national level issue even though local and regional contexts are required for management. How do we push for openness and transparency in systems like this?

In places like India where water quality treatment is largely left to private players and huge populations are not receiving treated water, the need for data to be available, open, and in the hands of central bodies but also local players is a must, because people need to try to find solutions and where to intervene. Given the huge problems with water borne diseases, the slow but epic arsenic and fluoride poisonings gripping parts of India, and the effects this will have for generations, making this data public, usable and demystified is no longer an option.

All in all, I have to say this was an enlightening experience, it was cool to be able to learn something about each country. In our continuous push for open data we sometimes get lost in standards, formats, and machine readability, but taking a moment to really prioritize our values in society and have open data reflect that is essential. Public health outcomes and engaging with complex issues like it are an essential part of how to grow the open data movement and make it relevant to millions more.

*(Correction: Previous version said the survey included 148 countries, the actual number is 122.)