Data{Meet} Pune, Second Meetup – Let’s talk Mapping

The 9th of August, 2015 marked 11 years of the OSM project. On the same weekend Datameet Pune fittingly held its second meetup, ‘Let’s talk Mapping’. The session was led by Devdatta (Dev) Tengshe, a veteran of the Bangalore Datameet group who has several years of experience in GIS and remote sensing having worked previously for ISRO. Dev initiated with a primer on what spatial data is and what can be done with spatial data, then followed with an introduction to GIS, a demonstration of OSM and information on sources for spatial data in the Indian context. His presentation can be found here. Below are the highlights of the session.

What is spatial data? Its uses?

Spatial (data) is not necessarily ‘special’ as many say. It is simply data with a spatial element to it, this could be latitude-longitude but pin codes and postal addresses could be used as spatial formats too. There are numerous advantages to viewing/analyzing social sector data spatially, whether it is census data, land records, city water supply/sewerage networks or other datasets. Spatial representation helps detect patterns and trends that may otherwise go unnoticed. Spatial data in the social sector also comes with its set of challenges.  Maps of land parcels for example are not recorded in any standardized way across the country, but instead using local landmarks (turn left at this tree, go straight for 50m, then turn right and head towards the banyan tree) Much of census data is also not easily available at the finer local levels, but only at the district level.

Spatial data can be used to solve spatial problems. Spatial data visualizations work with the strength of the human eye, which is to detect patterns visually. In the exploratory stage you may visualize it to detect patterns, e.g. a map of a user’s Facebook friends may unknowingly reveal areas of low internet penetration, a comparison of Bangalore’s bus routes vs Pune’s bus routes show a stark difference in connectivity. In further analysis you may also find spatial correlations. Spatial modelling is yet another application. These processes are in fact the same ones you would use with regular data, and like all other data, spatial data too requires a lot of cleaning.

IMG_20150808_174421

GIS 101

The real world is infinitely complex. To represent this spatial world in data we have to develop simplified models. These can be either Vector or Raster models. In vector models, we use points, lines and polygons to represent real world features (e.g. bus stops, bus routes, ward boundaries) whereas in raster models we use images of the earth’s surface taken by satellites, or UAVs which are composed of pixels to view the earth’s surface.

File formats for spatial data:

Vector

shapefiles are used within desktop softwares (QGIS, ArcGIS), geojson is used for web mapping (these are light, human and machine readable), kml (first developed by Keyhole, later bought by Google) is also a common format.

Raster

tiff (multiple bands) format allows for storage of larger datasets.

Spatial databases are now able to handle spatial data, allows spatial queries related to it, so a user doesn’t have to write out the logic for such operations (e.g. of spatial queries: Find the nearest school/hospital to this village?). Spatial databases are used by retail businesses, housing, utilities and many other commercial ventures.

Where do I get spatial data?

The Beg-Borrow-Steal theory

Beg

Create it yourself. In the process of field work you can use field kits to collect spatial data for your area of interest. Tools available for this include Locus map free – Outdoor GPS (App) OR Open Data Kit (Software suite). As an alternative, you may also digitize from satellite maps

Borrow and convert it

Data that may be available freely but not in a form that is easily usable and may need to be georeferenced.

“Steal”

Spatial data can be ‘scraped’ from websites that contain this data but do not make it easily available, see github datameet maps for examples of data collected from census websites. Although permission may not explicitly be given for this, since it is already up on the web and no copyright exists on the data it is implicitly understood to be open source.

Open Street Map (OSM)

The Wikipedia for spatial data, OSM, counts more than two million users who voluntarily contribute to the project. OSM was first aimed to collect just street data, but it has now expanded tremendously. City data in OSM is of high quality however for rural areas, only major roads can be guaranteed.

Unlike Google maps which does not allow a user direct access to its data, OSM raw data is available for download as well as editing. Within OSM users can tag different aspects of any object, giving others more information about it. Users can also introduce new key:value pairs if needed. OSM scripts monitor changes and an IRC chat room verifies these changes. OSM updates frequently and is therefore used in humanitarian situations (HOT OSM). Only 12 servers run all of OSM

IMG_20150808_173926

Wikimapia in comparison is limited, it allows you to draw on google maps, but there is no verification of additions and limited data download.

There are independent initiatives to make available raw data download from OSM [See slide 47] Similarly other apps use and make available OSM data, Map quest for instance gives directions based on OSM data. If you are unsure of the final use of your data you can download data in OSM XML format, since it contains everything. GeoJSON is useful only when you need shapes, not other features of spatial data.

Sources

  • Downloading OSM data for a country: Geofabrik
  • Downloading OSM data for any custom polygon: BBBike
  • Raw data based on particular data queries: Overpass Turbo

Spatial data in the Indian context

Districts/Tehsils

Shapefiles for districts and tehsils are available on Github, Datameet maps. However maps must be verified against other sources of data. In reality there is dispute even within the Indian government on how many districts India has.

Village boundaries

In reality, in many cases no fixed village boundaries exist, the Census uses blocks and settlements for reference. Some states however make available static maps showing village boundaries that can be georeferenced.

Pin codes

Can we divide the country into pin codes? Pin codes do not represent an area, they are points along a line where the postman will deliver. Hence the assignment of addresses to the last  three digits of a pin code is a decentralized decision. The lowest level of post offices decides. Pin codes also do not cover the entire country. Post offices to Pin codes do not have a one-to-one relation.

Census data

Census data at the finest spatial level comes down to census ward boundaries. Nobody outside the census department actually knows these boundaries. Pune city has 700 census ward boundaries (which do not correspond to administrative/electoral ward boundaries) mostly hand drawn. District level offices may have maps with these boundaries as hard copies.

Nothing in national policy disallows them from sharing them, but nevertheless government officials aren’t inclined to share such information. Certain limitations however do exist on government data sharing, protected military areas, areas near the national boundaries, topography maps etc. are prohibited.

Basemaps and DEMs (Digital Elevation Models)

The Open data initiative of the Government of India has created some 5400 odd ‘Open Series maps’ i.e. toposheets without height information. None of these are done digitally or printed. They can however be used with gps data since the lat-long is accurate.

Since GoI topography data isn’t made openly available, alternatives available are SRTM, ESTER and Bhuvan Cartosat. These are good for example for larger rural areas, but not feasible for urban areas. Private companies work with UAVs for very high resolution elevation data. For satellite imagery as basemaps, Landsat imagery, going back to 1970 is available.

Closing Remarks

In following up with our discussions on mapping, for those of you who are interested, we have several Pune specific mapping tasks that individuals can contribute to. E-mail us at [email protected] for more information. We hope that everyone found the discussion useful and thank you for coming, thanks to Dev for the informative session! Thanks to Shraddha and Thoughtworks Pune for hosting us. Do connect with us via social media [Twitter] or join our mailing list for information on the next meeting.

{Ahmedabad} – 3rd Meetup

This meetup was special as this was on my way back from the long drive. Since I was doing quite a bit of Open Data work on my trip, I thought I would talk about the same. So we had a long conversation about how we can contribute while on a long drive.

IMG_20150705_191653

The presentation is embedded below or you can check the presentation.

We discussed in detail about the following services to which any one can contribute

We also discussed about the Apps for Android that can be used to collect and submit data.

Latlong’s story of mapping India

The July edition of GeoBLR featured Rahul RS from Onze Technologies. Onze is the prefered store locator infrastructure by several businesses in India including TVS, Dell and Cafe Coffee Day. The store locator is powered by Onze’s very own Latlong.in – extensive, web based points of interest and map data interface.

2015-07-30 18.23.40

Rahul shared the story of Latlong.in, their infrastructure and challenges mapping Indian cities. They started out in 2007 at a time when there was no reasonable geographic data source available for India – commercial and non-commercial. Rahul’s team gathered toposheets from the Survey of India and georeferenced boundaries to incorporate into their maps. Rahul pointed out that these are inexpensive but high effort tasks. Plus, tools to do these are expensive.

In order to address India-specific mapping needs, geo-rectification needed to be inevitably supported by field surveys. Each city is unique and people entirely depend on landmarks and hyperlocal information to get around. Rahul brought in experts from different areas to gather local information. “The idea behind Latlong.in starts by saying that addresses don’t work in India”, says Rahul. When OpenStreetMap picked up, Latlong.in moved to a mix of their data and OSM that was maintained on their own. It is a complicated effort. Conflation and dealing with multiple revisions of data is tricky and there aren’t great tools to deal with it effortlessly. Latlong.in follows Survey of India’s National Map Policy. They avoid mapping defence and high security features.

Owning the entire data experience is critical to win in this market. Remaining open and improving continuously is the only way to keep your datasets upto date.

International Open Data Charter, Consultation Meeting

When: Bengaluru, July 28, 5:30 pm

Where: The CIS office address is Number 194, 2nd ‘C’ Cross, Domlur, 2nd Stage, Bangalore 560071 (opposite Domlur Club and near the TERI building).

This is to invite you to a consultation meeting on the first public draft of the International Open Data Charter organised by CIS with DataKind and DataMeet at the CIS office in Bengaluru, on Tuesday, July 28, 2015, at 5:30 pm.

The Charter is being developed by the Open Data Working Group of the Open Government Partnership in consultation with a number of international organisations. Meant for approval and implementation by national governments, the Charter has five key principles:

– Open by Default;
– Quality and Quantity;
– Useable by All;
– Engagement and Empowerment of Citizens; and
– Collaboration for Development and Innovation.

The first public draft of the International Open Data Charter was published in end of May 2015 at the International Open Data Conference in Ottawa, and can be accessed here: http://opendatacharter.net/charter/.

Organisations and individuals are invited to submit comments directly on the Charter page, before July 31.

We are organising this meeting to discuss the context, the drafting process, and the objectives of this document, and to encourage the participants to comment on the existing text of the Charter.

We keenly look forward to your participation in the consultation meeting on Tuesday.

Data{Meet} Pune – First Meetup

Datameet Pune, hosted its first meetup last Monday, the 13th of July at Thoughtworks, Pune. The idea of DataMeet which originated in Bangalore as a community of data enthusiasts, working on civic issues has now spread to several cities across the country, Pune being the latest.

Datameet Pune - First Meetup (1)

Twenty-six people of diverse backgrounds, both from the programming world (students and professionals) as well as those conversant with social sector issues (NGOs and citizens) attended the meeting (including 3 via Google Hangout). A icebreaker and a game of Pune related trivia got the meeting off to a start. Participants introduced themselves and their broad areas of interest. Ideas revolved around public transport, voter registration, land use change, water and sanitation, waste management, education, mapping, data visualization and more. The organizers then gave a brief presentation on the idea of DataMeet, examples of data successes in the social sector elsewhere and the possible scope of projects that can be explored within the Pune group. Nikhil welcomed those interested to pitch in on some of his projects related to Pune’s bus routes management system and Pune’s budget sheet.

Datameet Pune - First Meetup (2)

The floor was then open to the participants to QnA and ideas. Participants discussed the format of further engagement within the group. They agreed that it would be best to start off with monthly meetings organized around topics (related to data and civic issues) where a speaker could initiate discussion based on his/her experience. Topics suggested were mapping, basic statistics, R/Python, better data analysis with Excel, etc. Dev, Vinayak and Rasagy originally from the Bangalore DataMeet agreed to initiate discussions on possible topics. Rahul, urged that the topics taken up by speakers should have a practical orientation rather than being more theoretical, since seeing practical applications tends to interest people more. Sanskriti also suggested sector specific meetups for example on transport, since the Pune public transport service (PMPML) is launching a new BRT route. Participants were briefed about hackathons and Open Data Camps (ODCs) which have happened in other cities and it was suggested that Pune could explore these formats as well.

The forum for online engagement of the Pune group, suggested by Vinayak, was Slack.com, to which everyone was agreeable. (a Slack channel was later setup for the Pune group on the main Datameet Slack). For in-person meetings, everyone agreed to meeting once a month, and Saturday was the day agreeable to most, early evening or morning were suggested as possible timings. Additional venues, including CEE, Drive Change, Flame University and Indradhanushya were also suggested. A meetup page was setup by Anurag, for updates about future meetups.

Participants were also strongly urged to fill out the DataMeet Pune Interest Form to hear about future activities, available here. The meeting was overall a great success,the participants showing a lot of enthusiasm for actively collaborating together. Please stay tuned for announcements of future meetings. In the meanwhile you can find the Google Hangout recording of the meeting here. For Pune specific queries please email [email protected] or contact Craig/Nikhil.

Craig D: 7276085960, [email protected] or Nikhil VJ: 9665831250, [email protected]

Mumbai Meet 7: Housing.com Talks On Analytics, Ask How India’s Story Telling Techniques

IMG_0072

The Mumbai Data Meet kicked off the 7th session with two prominent speakers. The event was held on 30th May at the Sardar Patel Institute of Technology and was attended by more than 50+ people.

The first talk was by the team at Housing.com, which is an online real estate portal. The talk was shared between Paul Meinshausen, VP of Data Science, and Sourabh Rohilla, Data Scientist.

IMG_0080

The second talk was by Yogesh Upadhyaya of AskHowIndia.org. His talk was centered around the use of data and visual story telling techniques.

IMG_0104

You can listen to the entire talk here.

And you can read more about it by Sidharth Shah over here. He has summarised the entire talk.

The next Mumbai data meet 8 will be held on 17th July.

After success of the “Data Science Hackathon” that was co-hosted with Zone Startups and a BFSI company, we are now co-organising another hackathon along with a corporate and Zone Startups.

Click here for more details.

IMG_0070

Maps For Disaster Preparedness

screensavescreensaveDatameet, Mapbox and Akvo Foundation are organizing an OpenStreetMapping Party on 4th of July 2015 in New Delhi.

We are getting together to map  a few Indian cities and villages – improving road networks and infrastructure data in OpenStreetMap – the largest living map of the world. Join us to learn how to map on OpenStreetMap.

The Humanitarian OpenStreetMap Team activates in times of crisis to support responding organisations with map data, helping them better plan disaster response. Mappers around the world get together to improve road networks and infrastructure data in OpenStreetMap. The Humanitarian OpenStreetMap team leverages the OpenStreetMap platform in various directions to support crisis management with map data.

The impact we can make during a crisis is entirely dependent on the availability of map data. We will focus on understnding how maps and data can be used in the Indian context through a hands-on workshop, and integrate many of the lessons we learned first hand from the most recent earthquake that struck Nepal where over 2,000 volunteer from across the world came together to support the response efforts to quadruple road milage and add 30% more buildings in the most affect regions of Nepal.

Please visit the event page to RSVP or join the Facebook Event.

Mumbai Meet 6: Data Science Hackathon

DataMeet 6 was a 2 day, Data Science Hackathon that was organised by a BFSI company, Zone Startups and DataMeet Mumbai. The Hackathon took place in the Bombay Stock Exchange Building at Zone Startup’s office. Twelve teams participated. These included teams of young data enthusiasts and specialist data scientists teams from companies like TCS and Housing.com.

The BFSI company opened up 80GB of it’s real transactional data in a secure environment to the participating data enthusiasts.

The teams were expected to analyze the data and draw out insights that would be relevant to their use case scenarios such as Health Bankruptcy or pull out a trend which is hidden and unknown to the BFSI company. Teams were free to use any tool of their choice from R, Python, Tableau, etc.

Each team was provided an individual secure Oracle DB connection from which they could query the data but not download the data. The Oracle DB connections were opened only to the Static IPs of Zone Startups Office and the data to and fro from the servers was monitored to ensure against downloading of the data.

Day 1

The day started with various teams analysing the raw data, tables, meaning of columns. The representatives from the BFSI company also gave a briefing about objectives.

600_437353359

Day 2

Many of the young teams did not turn up on Day 2 due to complexity of the problem. At the end of Day 2, the judges from the BFSI company evaluated each team’s progress, gave feedback and suggestions.

600_437353364

Delhi Jal Board and Open Water Data: Report from DataMeet-Up

A DataMeet-Up was held on Thursday, April 30, 2015, at the Akvo office to discuss the Summer Action Plan prepared by the Delhi Jal Board (DJB) and the data concerns thereof. Sundeep Narwani of Delhi Dialogue Commission presented the Action Plan. Kapil Mishra, MLA and Vice-Chairman of DJB, participated in the discussions and described the planned activities at DJB.

Here are the minutes of the meeting, prepared by Sandeep Mertia.

Sundeep Narwani, started with a presentation on Delhi Jal Board’s Summer Action Plan (SAP) 2015. Some important point from his presentation were:

  • SAP is a short term measure for three months of summer
  • Big problem: 40% of Delhi does not have enough piped water networks, and thus tanker services and unauthorised supply exist
  • They have planned several measures for improving the systems of – tube wells, infrastructure (replacing old lines) and repairs, grievance redressal and sewage treatment.

The details are available in the Summer Action Plan document.

This was followed by a long discussion session on several issues and concerns – related to water problems in Delhi. Some of the important questions, comments and suggestions were:

  • What is the authenticity of data which DJB has?
    Answer: Doubtful.
  • What’s the organisational structure of DJB, and its relationship with the MCD?
    Answer: DJB is a state body, independent from MCD
  • There is no data on bulk supply to colonies
  • Very little end user data. Lack of meter reading and averaged bills are part of the reasons for this problem
  • There is no way to interconnect supply between localities
  • No data on quality of water
  • Dr. Rajinder Kaur spoke about using existing spatial data maps of NCT/NCR, and the research conducted by the Indian Agricultural Research Institute on using spatial data for classifying ground water depth and quality
  • Dr. Renu Khosla, from Centre for Urban and Regional Excellence spoke about using GIS data for slums
  • Mr. Kapil Mishra, the Vice-Chairman of the DJB spoke at length about how they plan to transform the DJB. Also, he promised all data sharing from DJB’s side.

After a general discussion on various water related issues in Delhi, in the last segment we focused on framing the data problems associated with SAP 2015.

  • Need to think about the water data which already exists.
  • Sundeep will put up a list of people and organisations which have data on water in Delhi
  • A suggestion was made to focus on Gram Sabha level data as well
  • Need to prioritize the issue of water access to all, the missing data on ‘access’ related problems and appropriate mechanisms
  • Some private bodies have been collecting data from GPRS meters, let’s try to open this data
  • Need to map the borewells
  • Need to interpret and understand the data which DJB requires.

To Do list

  • We will list out all data sets that have informed the SAP document [Time: 2 weeks]
  • Sundeep will share the data sets already available from DJB
  • Once the list of data sets is prepared by us, it will be submitted to DJB via Sundeep and Kapil, who will then see if the mentioned data sets can be opened up. [Approximate time: 1 month]
  • Once these data sets are available, we will evaluate the quality of these data sets
  • Identify data sets that are missing and the ones that require a better collection process

Resources

We are using a Google spreadsheet to list out all data sets that informed the Summer Action Plan.

We are using HackPad to collect various resources.

Images of notes takes by Namrata Mehta and Sumandro Chattapadhyay at the meeting:

DataMeet_2015.04.30_DJB_Water_Data_Notes_01

DataMeet_2015.04.30_DJB_Water_Data_Notes_02

DataMeet_2015.04.30_DJB_Water_Data_Notes_03

DataMeet_2015.04.30_DJB_Water_Data_Notes_04

{Ahmedabad} – 2nd Meetup

Data{Meet} Ahmedabad – 2nd Meetup

Data{Meet} Ahmedabad - 2nd Meeting

Data{Meet} Ahmedabad – 2nd Meeting

Our 2nd meetup was held at IIM-A, under the aegis of the RTE Resource Centre, with 20 participants; half of them had attended the 1st meetup.

Talk #1: All walls come down – by Ashish Ranjan, RTE Resource Centre, IIM-A

The first talk in the 2nd DataMeet of Ahmedabad Chapter brought forward the efforts being put together by the team RTE, working out of IIM-Ahmedabad. The team members present at the venue were Prof. Ankur Sarin, Ashish Ranjan, Advaita R and Nishank Varshney. Ashish presented their journey of supporting the implementation of RTE in the state of Gujarat.

The Right to Education (RTE) act Section 12 requires schools to enrol a certain number of children from economically weaker families. The RTE Resource Centre (rterc.in) organises pre-enrolment campaigns for the benefit of prospective students and their parents, and has enlisted NGOs for hand-holding the children post-enrolment. The talk gave a glimpse of their experience in Ahmedabad, observations from Maharashtra, and the data-related challenges they faced.

datameet2

The management of this important activity was being done manually. This threw up many problems:
The registration of beneficiary families was often incomplete, with partial addresses – recording just the area of residence e.g. “Jamalpur”. This lead to many parents complaining about non-receipt of allotment letters..
There was no mapping of schools or beneficiary families, which could have aided better matching of children and schools.

A study by the team RTE revealed how a large number of schools were finding their way around the RTE mandate. These methods include making demands of un-required documentation of the parents to, tricking the MIS systems which enable applications from parents, into counting the ages of the children eligible for the schools as ineligible amongst various others. Nishank pitched in with instances from Maharashtra, where the minimum and maximum permissible age limits were deliberately entered by schools in such a way that potential students would be under age during an admission year, and over age the next year, effectively excluding them. In some particularly bad cases, the difference was one day: the child would have to be born on a specific date. For lack of efficient and transparent allotment processes, there were cases of candidates getting multiple admissions (as much as 18) while some did not get any. To bring out all these analyses, though, the school and student data from the Maharashtra RTE website had to be painstakingly downloaded, manually. Many DMers offered support to gather this data more easily.

datameet3

The team was quite inspired by the school map of the Karnataka Learning Partnership (klp.org.in/map) and wants to build such a comprehensive tool for themselves, with features to find schools within a specified distance, and help match students with schools. Unlike the Karnataka programme, there still is no MIS in place to facilitate the enrolment and selection process. Shravan suggested that it might be possible to use the codebase of KLP and adapt it for use in Ahmedabad. Hopefully, the D{M} folks will volunteer for the necessary support.

The RTE team also wants to build a tool to track the performance of enrolled students. They discussed about the potential privacy issues involved in this. It was suggested that the performance reporting to be published on the website could be at an appropriate level of aggregation which safeguards privacy and preserves discernible performance stats. The possibility of using ODK for volunteer led data collection was also discussed.
Getting together at the meetup opened up many possibilities for collaboration from the participants as a few of them came forward with suggestions and also extended their support to this cause.

Talk #2: Public Transport of Ahmedabad

Jayesh Gohel is not your everyday architect. He dropped out of his course at CEPT because he got too interested in code and soon enough he started enjoying making websites. Being an Amdavadi, he noticed the lack of infrastructure, both digital and non-digital in supporting the commuting that AMTS enabled in the city and so he decided to work on amtsinfo.in – the unofficial official support and information website for the Ahmedabad Municipal Transport Service.

Jaye

At the 2nd Datameet in Ahmedabad, Jayesh inspired the audience with his experiences with developing the website with the sole aim of solving the information problem related to the rather important and convenient network that AMTS is. Jayesh’s talk was simple and spoke about his personal motivations and learnings in the course of the development of this app. It also brought to the light the issues that plague the archaic systems that govern our modern lives, which can otherwise be so easily solved with the use of digital technology. However, ‘there’s hope if all of us take initiatives’, Jayesh said.

DataMeet is a community of Data Science and Open Data enthusiasts.