Ever since we started publishing Indian village boundaries in GeoJSON format, the single most question asked was how to convert geojson into Shapefile. To answer that question I created a screencast. Here it is.
The tool used is called MapShaper.
The 9th of August, 2015 marked 11 years of the OSM project. On the same weekend Datameet Pune fittingly held its second meetup, ‘Let’s talk Mapping’. The session was led by Devdatta (Dev) Tengshe, a veteran of the Bangalore Datameet group who has several years of experience in GIS and remote sensing having worked previously for ISRO. Dev initiated with a primer on what spatial data is and what can be done with spatial data, then followed with an introduction to GIS, a demonstration of OSM and information on sources for spatial data in the Indian context. His presentation can be found here. Below are the highlights of the session.
Spatial (data) is not necessarily ‘special’ as many say. It is simply data with a spatial element to it, this could be latitude-longitude but pin codes and postal addresses could be used as spatial formats too. There are numerous advantages to viewing/analyzing social sector data spatially, whether it is census data, land records, city water supply/sewerage networks or other datasets. Spatial representation helps detect patterns and trends that may otherwise go unnoticed. Spatial data in the social sector also comes with its set of challenges. Maps of land parcels for example are not recorded in any standardized way across the country, but instead using local landmarks (turn left at this tree, go straight for 50m, then turn right and head towards the banyan tree) Much of census data is also not easily available at the finer local levels, but only at the district level.
Spatial data can be used to solve spatial problems. Spatial data visualizations work with the strength of the human eye, which is to detect patterns visually. In the exploratory stage you may visualize it to detect patterns, e.g. a map of a user’s Facebook friends may unknowingly reveal areas of low internet penetration, a comparison of Bangalore’s bus routes vs Pune’s bus routes show a stark difference in connectivity. In further analysis you may also find spatial correlations. Spatial modelling is yet another application. These processes are in fact the same ones you would use with regular data, and like all other data, spatial data too requires a lot of cleaning.
The real world is infinitely complex. To represent this spatial world in data we have to develop simplified models. These can be either Vector or Raster models. In vector models, we use points, lines and polygons to represent real world features (e.g. bus stops, bus routes, ward boundaries) whereas in raster models we use images of the earth’s surface taken by satellites, or UAVs which are composed of pixels to view the earth’s surface.
File formats for spatial data:
shapefiles are used within desktop softwares (QGIS, ArcGIS), geojson is used for web mapping (these are light, human and machine readable), kml (first developed by Keyhole, later bought by Google) is also a common format.
tiff (multiple bands) format allows for storage of larger datasets.
Spatial databases are now able to handle spatial data, allows spatial queries related to it, so a user doesn’t have to write out the logic for such operations (e.g. of spatial queries: Find the nearest school/hospital to this village?). Spatial databases are used by retail businesses, housing, utilities and many other commercial ventures.
The Beg-Borrow-Steal theory
Create it yourself. In the process of field work you can use field kits to collect spatial data for your area of interest. Tools available for this include Locus map free – Outdoor GPS (App) OR Open Data Kit (Software suite). As an alternative, you may also digitize from satellite maps
Borrow and convert it
Data that may be available freely but not in a form that is easily usable and may need to be georeferenced.
Spatial data can be ‘scraped’ from websites that contain this data but do not make it easily available, see github datameet maps for examples of data collected from census websites. Although permission may not explicitly be given for this, since it is already up on the web and no copyright exists on the data it is implicitly understood to be open source.
The Wikipedia for spatial data, OSM, counts more than two million users who voluntarily contribute to the project. OSM was first aimed to collect just street data, but it has now expanded tremendously. City data in OSM is of high quality however for rural areas, only major roads can be guaranteed.
Unlike Google maps which does not allow a user direct access to its data, OSM raw data is available for download as well as editing. Within OSM users can tag different aspects of any object, giving others more information about it. Users can also introduce new key:value pairs if needed. OSM scripts monitor changes and an IRC chat room verifies these changes. OSM updates frequently and is therefore used in humanitarian situations (HOT OSM). Only 12 servers run all of OSM
Wikimapia in comparison is limited, it allows you to draw on google maps, but there is no verification of additions and limited data download.
There are independent initiatives to make available raw data download from OSM [See slide 47] Similarly other apps use and make available OSM data, Map quest for instance gives directions based on OSM data. If you are unsure of the final use of your data you can download data in OSM XML format, since it contains everything. GeoJSON is useful only when you need shapes, not other features of spatial data.
Shapefiles for districts and tehsils are available on Github, Datameet maps. However maps must be verified against other sources of data. In reality there is dispute even within the Indian government on how many districts India has.
In reality, in many cases no fixed village boundaries exist, the Census uses blocks and settlements for reference. Some states however make available static maps showing village boundaries that can be georeferenced.
Can we divide the country into pin codes? Pin codes do not represent an area, they are points along a line where the postman will deliver. Hence the assignment of addresses to the last three digits of a pin code is a decentralized decision. The lowest level of post offices decides. Pin codes also do not cover the entire country. Post offices to Pin codes do not have a one-to-one relation.
Census data at the finest spatial level comes down to census ward boundaries. Nobody outside the census department actually knows these boundaries. Pune city has 700 census ward boundaries (which do not correspond to administrative/electoral ward boundaries) mostly hand drawn. District level offices may have maps with these boundaries as hard copies.
Nothing in national policy disallows them from sharing them, but nevertheless government officials aren’t inclined to share such information. Certain limitations however do exist on government data sharing, protected military areas, areas near the national boundaries, topography maps etc. are prohibited.
Basemaps and DEMs (Digital Elevation Models)
The Open data initiative of the Government of India has created some 5400 odd ‘Open Series maps’ i.e. toposheets without height information. None of these are done digitally or printed. They can however be used with gps data since the lat-long is accurate.
Since GoI topography data isn’t made openly available, alternatives available are SRTM, ESTER and Bhuvan Cartosat. These are good for example for larger rural areas, but not feasible for urban areas. Private companies work with UAVs for very high resolution elevation data. For satellite imagery as basemaps, Landsat imagery, going back to 1970 is available.
In following up with our discussions on mapping, for those of you who are interested, we have several Pune specific mapping tasks that individuals can contribute to. E-mail us at firstname.lastname@example.org for more information. We hope that everyone found the discussion useful and thank you for coming, thanks to Dev for the informative session! Thanks to Shraddha and Thoughtworks Pune for hosting us. Do connect with us via social media [Twitter] or join our mailing list for information on the next meeting.
The July edition of GeoBLR featured Rahul RS from Onze Technologies. Onze is the prefered store locator infrastructure by several businesses in India including TVS, Dell and Cafe Coffee Day. The store locator is powered by Onze’s very own Latlong.in – extensive, web based points of interest and map data interface.
Rahul shared the story of Latlong.in, their infrastructure and challenges mapping Indian cities. They started out in 2007 at a time when there was no reasonable geographic data source available for India – commercial and non-commercial. Rahul’s team gathered toposheets from the Survey of India and georeferenced boundaries to incorporate into their maps. Rahul pointed out that these are inexpensive but high effort tasks. Plus, tools to do these are expensive.
In order to address India-specific mapping needs, geo-rectification needed to be inevitably supported by field surveys. Each city is unique and people entirely depend on landmarks and hyperlocal information to get around. Rahul brought in experts from different areas to gather local information. “The idea behind Latlong.in starts by saying that addresses don’t work in India”, says Rahul. When OpenStreetMap picked up, Latlong.in moved to a mix of their data and OSM that was maintained on their own. It is a complicated effort. Conflation and dealing with multiple revisions of data is tricky and there aren’t great tools to deal with it effortlessly. Latlong.in follows Survey of India’s National Map Policy. They avoid mapping defence and high security features.
Owning the entire data experience is critical to win in this market. Remaining open and improving continuously is the only way to keep your datasets upto date.
To kick things off in 2015, we met at the offices of the Centre for Internet and Society (CIS), Bengaluru to map the unmapped/less-mapped settlements along the proposed Delhi-Mumbai Infrastructure Corridor (DMIC) project. The DMIC, a 1,483 km-long development corridor spanning several states in northern and western India, has been attracting a lot of curiosity and criticism from the national and international participants and observers. The project will have built a dedicated freight corridor, several industrial and logistics hubs, and smart cities at its completion. The project has been structured to be constructed in phases. The pilot project for an integrated smart city, Dholera Special Investment Region (SIR), is underway.
The quality of mapping in many regions relies on a very active mapping community, or a strong interest from a collectives and local networks. We think it is important regardless to map the assets that pre-exist around the proposed sites of developments. With this in mind, we decided to take a look at the areas earmarked for the Dholera SIR (Gujarat), Shendra (Maharashtra), Mhow (Madhya Pradesh), and Dadri/ Greater Noida (NCR). The evening began with Tejas introducing the DMIC project, the scale of new development, and the need to capture these changes for years to come on OpenStreetMap (OSM). Sajjad provided a rapid tutorial on signing up for OSM, and using the browser-based map editor. The party was attended by guests at CIS as well as remotely from Bangalore and Dharamsala.
As the party progressed, several guests ended up mapping roads, buildings, and water bodies in the Dholera region. Others chose to similarly map Shendra, and Dadri.
Mumbai saw its third data meet on 26th October, 2014 with a total of 14 participants, in-spite of it being a Diwali weekend. This time around we decided to try out a new place and the venue was a roof top place located at Chium Village, Khar West. A nice cozy place but a tad bit difficult to find for people who are not familiar with the area.
A few weeks ago we held an Intro to Data Journalism Workshop. Josephine Joseph was in attendance, she regularly writes for Citizen Matters, Bangalore’s local paper that knows all. She was working on this story and has published it last week with Citizen Matters, I’m very happy to crosspost it here as a great example of local data journalism.
East Bangalore area, particularly Whitefield- KR Puram – Mahadevapura area, is on the prime real estate map. What are the projects coming up next? What are the implications?
Investing in real estate in Bangalore is a dream of any investor. However, is the growth of this sector in tune with the infrastructure that the city can handle?
A close look by Citizen Matters at 26 constructions coming up in Whitefield – KR Puram area in East Bengaluru shows some alarming observations. When the 8,000 flats are fully occupied, new residents will need 10,662.87 KL of water a day (equivalent of 1780 water tankers of 6000 Litres). More than 19,697 cars will add to Whitefield traffic.
Ministry of Environment and Forests (MoEF) rules make builders of projects of more than 20,000 sqm built up area, apply for an Environmental Clearance (EC) from the state, along with all the other permissions and NOC from BBMP, BWSSB, Karnataka Ground Water Authority (KGWA) to drill borewells prior to construction commencement.
The State Expert Appraisal Committee (SEAC) receives the applications and recommends checks and balances, prior to recommending a project for EC to the State Environment Impact Assessment Authority (SEIAA).
The SEIAA reviews project details, clarifies issues and only then is the EC issued. In cases where construction has begun without an EC, the builder is served with a show cause notice. The KSPCB can file cases against builders under the Environment Protection Act if they proceed with construction without an EC.
Read the rest over at Citizen Matters.
Great work Josephine!