Category Archives: News

Demonetisation with Srinivasan Ramani

Srinivasan Ramani is Deputy National Editor who works with data at The Hindu. He has been a long time member of DataMeet community. This week I caught up with him to talk about Demonetisation move by Government of India.


Show Notes

Crossposted.

OpenPostbox.org

We got a chance to talk to members of Karnataka Philatelic Society about OpenPostBox. They are very interesting set of people. They have also started sending me the postbox pictures using WhatsApp along with location. Now I need to find an efficient way to extract them and insert into my database.

As of now I am thinking of Export -> Parse -> Insert. Working on it. If you have any ideas do email me.

dm_openpostbox

Details of the meet are on my personal blog if you like to read.

Five Years of DataMeet Discussions

We consider 26/01/2011 as DataMeet birthday. Thats the day we talked about starting DataMeet and hence it is the birthday. But the first email to the group was sent by S.Anand on 27/01/2011. Its been five years since that first email. I took this opportunity to scrape the email list to see how we are doing and what we talked about in last five years.

Growth

Activity

Members have started 1525 and have sent in total 4570 emails. But most important is how many participate.
infogram

Category Members
No Emails 855
1 Emails 184
2 Emails 75
3 Emails 43
More than 3 189

Discussions

Go have a look at full view of the traffic graph. Except for few peaks the group has been fairly consistent.

Starters

We have discussed about 1525 in last five years. Here is the list of top 20 starters.

author total topics started
Nisha Thompson 199
Thejesh GN 164
sumandro 71
Sridhar Gutam 64
srinivas kodali 36
Gautam John 30
Sajjad Anwar 28
Pranesh Prakash 27
bawaza…@gmail.com 27
Venkatraman.S. 23
satyaakam 22
S Anand 21
Balaji Subbaraman 20
Nikhil VJ 19
Justin Meyers 15
Sanky 15
Dilip Damle 14
Maya Indira Ganesh 13
Shree 13

First Responders

The first responders are important when someone posts a question. They are the first ones to respond to the questions. As you would have guessed the list is different from the starters list.

author number first response
Devdatta Tengshe 36
Gautam John 36
Nisha Thompson 57
srinivas kodali 28
Thejesh GN 27
Sajjad Anwar 21
satyaakam 20
Arun Ganesh 16
Avinash Celestine 15
Venkatraman.S. 15
Anand Chitipothu 14
sumandro 13
Dilip Damle 10
JohnsonC 10
S Anand 10
Gora Mohanty 9
Meera K 9
Sabarish Karunakar 9
Nikhil VJ 8

Part of many discussions

These are the members who have participated the most.

author total_emails_sent
Nisha Thompson 397
Thejesh GN 297
Gautam John 158
srinivas kodali 128
sumandro 109
Sajjad Anwar 93
Arun Ganesh 88
Dilip Damle 88
Devdatta Tengshe 85
satyaakam 83
Sridhar Gutam 81
Avinash Celestine 73
Justin Meyers 71
S Anand 68
Pranesh Prakash 67
Venkatraman.S. 64
Nikhil VJ 55
Raphael Susewind 55
Anand Chitipothu 51

Topics

We have discussed many many topics over years. But there are some popular topics. I have the list of topics by most replies.

Starter date/time topic
Karthik Shashidhar 2015-05-04 23:00:01 Shapefiles for "complete" India
megha 2014-04-10 14:10:21 MP/MLA Shapes
Srihari Srinivasan 2013-03-06 22:59:44 List of BMTC Bus stops
Nisha Thompson 2014-05-20 23:51:49 Logo Contest Voting!
S Anand 2016-02-01 18:31:38 PIN code geocoding
Siddarth Raman 2014-04-17 16:16:29 Parliamentary Constituency to Assembly Constituency to Ward linkages
Nisha 2013-04-15 09:44:21 April's Bangalore DataMeet
Gautam John 2012-04-14 09:49:50 I Change My City
Arun Ganesh 2011-03-14 11:23:25 Licensing crowdourced data projects
Sharad Lele 2015-11-27 19:59:49 Census of India seems to have maps of everything!

We also get quite a bit of traffic through search engines. So here is the list of top topics by views.

username date_time views topic
Karthik Shashidhar 2015-05-04 23:00:01 12324 Shapefiles for "complete" India
S Anand 2016-02-01 18:31:38 4783 PIN code geocoding
srinivas kodali 2013-07-01 12:49:33 2291 GeoJson data of Indian states
Aashish Gupta 2014-02-24 10:23:12 763 1981 and 1991 district-wise census data
Justin Meyers 2014-07-26 22:05:13 668 Updated Taluk Shapefile!!
indro ray 2013-08-13 10:21:18 651 MCD Delhi Admin Boundary GIS map
My profile photo 2012-08-30 17:41:45 615 Bangalore – BBMP ward boundaries – shape files available now
megha 2014-04-10 14:10:21 556 MP/MLA Shapes
Kavita Arora 2012-09-13 23:32:25 546 Ward Wise data for Bangalore – 2011 census?
Renaud Misslin 2014-12-03 09:45:16 426 Delhi ward shapefile for census 2011 data

At last customary wordcloud of topics.

wordcloud_subjects_arrow2

Of course all the scrapers and data is available on github. Go ahead make your own visualizations.

Nobel prize Winner Angus Deaton on the importance Open Data in India

On Data{Meet} we have been talking about the importance of Open Data and quality of it. This year’s winner of the Nobel Prize for Economics Angus Deaton has similar point of view on the quality of open data. Whole article is worth reading, I am quoting a paragraph.

My work shows how important it is that independent researchers should have access to data, so that government statistics can be checked, and so that the democratic debate within India can be informed by the different interpretations of different scholars. High quality, open, transparent, and uncensored data are needed to support democracy.

I have used data from India’s famous National Sample Surveys to measure poverty. Perhaps the biggest threat to these measures is that there is an enormous discrepancy between the National Accounts Statistics and the surveys. The surveys “find” less consumption than do the national accounts, whose measures also grow more rapidly. While I am sure that part of the problem lies with the surveys—as more people spend more on a wider variety of things, the total is harder to capture—but there are weaknesses on the NAS side too, and I have been distressed over the years that critics of the surveys have got a lot more attention than critics of the growth measures. Perhaps no one wants to risk a change that will diminish India’s spectacular (at least as measured) rate of growth?

Source: TheWire
Picture credit: Nobel Prize

OpenDataCamp Delhi 2014 in Tweets


Filter coffee at #odcdel14


Rebuilding the Karnataka Learning Partnership Platform

The Karnataka Learning Partnership recently launched a new version of their platform. This post talks about why they are building this and also some of the features and details. This is cross-posted from their blog.

Over the past five months we have been busy rearchitecting our infrastructure at Karnataka Learning Partnership. Today, we are launching the beta version of the website and the API that powers most of it. There are still a few rough edges and incomplete features, but we think it is important to release early and get your feedback. We wanted to write this blog post along with the release to give you an overview of what has changed and some of the details of why we think this is a better way of doing it.

Data

We have a semi-federated database architecture. There is data from Akshara, Akshaya Patra, DISE and other partners; geographic data, aggregations and meta-data to help make sense of a lot of this. From our experience PostgreSQL is perhaps the most versatile open-source database management system out there, Especially when we have large amounts of geographic data. As part of this rewrite, we upgraded to PostgreSQL 9.3, which means better performance and new features.

Writing a web application which reads from multiple databases can be a difficult task. The trick is make sure that there is the right amount of cohesiveness. We are using Materialized Views in PostgreSQL. Materialized View is a database object that stores the result of a query in a on-disk table structure. They can be indexed separately and offer higher performance and flexibility compared to ordinary database views. We bring the data in multiple databases together using Materialized Views and refreshing them periodically.

We have a few new datasets – MP/MLA geographic boundaries, PIN code boundaries and aggregations of various parameters for schools.

API

The majority of efforts during the rewrite went into making the API, user interface and experience. We started by writing down some background. The exhaustive list of things that the API can do are here.

We have a fairly strong Python background and it has proven to be sustainable at many levels. Considering the skill-sets of our team and our preference for readable, maintainable code, Django was an obvious choice as our back-end framework. Django is a popular web development framework for Python.

Since we were building a fairly extensive API including user authentication, etc., we quickly realized that it would be useful to use one of the many API frameworks built on top of Django. After some experimentation with a few different frameworks, we settled on using Django-Rest-Framework. Our aim was to build on a clean, RESTful API design, and the paradigms offered by Rest-Framework suited that perfectly. There was a bit of a learning curve to get used to concepts like Serializers, API Views, etc. that Rest-Framework provides, but we feel it has allowed us to accomplish a lot of complex behaviours while maintaining a clean, modular, readable code-base.

Design

For our front-end, we were working with the awesome folks at Uncommon, who provided us gorgeous templates to work with. After lengthy discussions and evaluating various front-end frameworks, we felt none of them quite suited what we were doing, and involved too much overhead. Most front-end frameworks are geared toward making Single Page Apps and while each of our individual pages have a fair amount of complexity, we did not want to convert everything into a giant single page app, as our experience has shown that can quickly lead to spiraling complexity, regardless of the frame-work one uses.

We decided to keep things simple and use basic modular Javascript concepts and techniques to provide a wrapper around the templates that Uncommon had provided and talk to our API to get and post data. This worked out pretty well, allowing us to keep various modules separated, re-use code provided by the design team as much as possible, and not have to spend additional hours and days fighting to fit our code into the conventions of a framework.
All code, design and architecture decisions are in the open, much like how rest of our organisation works. You can see the code and the activity log in our Github account.

Features

For the most part, this beta release attempts to duplicate what we had in v10.0 of the KLP website. However, there are a few new features and few features that have not yet made it through and a number of features and improvements due in future revisions.

Aside from the API, there are a few important new features worth exploring:

  1. The compare feature available at the school and pre-school level. This allows you to compare any two schools or pre-schools.

    1. Planned Improvements: The ability to compare at all and any levels of hierarchy; a block to a block or even a block to a district etc.

  2. The volunteer feature allows partner organisations to post volunteer opportunities and events at schools and pre-schools. It also allows users to sign up for such events.

    1. Planned Improvements: Richer volunteer and organisation profiles and social sharing options.

  3. The search box on the map now searches through school names, hierarchy (district, block etc.) names, elected representative constituency names and PIN Codes.

    1. Planned Improvements: To add neighbourhood and name based location search.

  4. An all new map page powered by our own tile server.

  5. Our raw data page is now powered by APIs and the data is always current unlike our previous version which had static CSV files.

    1. Planned Improvements: To add timestamps to the files and to provide more data sources for download.

Now that we have a fairly stable new code base for the KLP website, there are a few features from the old site that we still need to add:

  1. Assessment data and visualisations of class, school and hierarchy performance in learning assessments needs to be added. The reason we have chosen not to add it just yet is because we are modifying our assessment analysis and visualisation methodology to be simpler to understand.

  2. Detail pages for higher levels of aggregation – like a cluster, block and district with information aggregated to that level.

  3. A refresh of the KLP database to bring it up to date with the current academic year. All these three have not been done for the same reason; because this requires an exhaustive refactor of the existing database to support the new assessment schemas and aggregation and comparison logic.

 

Aside from the three above, we have a few more features that have been designed and written but did not make it in to the current release.

  1. Like the volunteer workflow, we have a donation workflow that allows partner organisations to post donation requirements on behalf of the schools and pre-schools they work with for things these schools and pre-schools require and other in-kind donations. For example, a school might want to set up a computer lab and requires a number of individual items to make it happen. Users can choose to donate either the entire lab or individual items and the partner organisation will help deal with the logistics of the donation.

 

Our next release is due mid-October to include the volunteer work flow and squish bugs. Post that, we will have a major release in mid-January with the refactored databases and all of the changes that it enables and all the planned improvements listed above. And yes, we do have a mobile application on our minds too.

The DISE application will be updated with the current years data as well by November. We will also add the ability to be able to compare any two schools or hierarchies by December.

So that’s where we are, four years on. The KLP model continues to grow and we now believe we have a robust base on which to rapidly build upon and deploy continuously.

For the record, this is version 11. 🙂

Crosspost: Adding stress to a stressed area!

A few weeks ago we held an Intro to Data Journalism Workshop.  Josephine Joseph was in attendance, she regularly writes for Citizen Matters, Bangalore’s local paper that knows all.  She was working on this story and has published it last week with Citizen Matters, I’m very happy to crosspost it here as a great example of local data journalism.  

26 projects could: add 19,000 cars to Whitefield traffic, up water demand by 10.5 million litres

East Bangalore area, particularly Whitefield- KR Puram – Mahadevapura area, is on the prime real estate map. What are the projects coming up next? What are the implications?

Investing in real estate in Bangalore is a dream of any investor. However, is the growth of this sector in tune with the infrastructure that the city can handle?

A close look by Citizen Matters at 26 constructions coming up in Whitefield – KR Puram area in East Bengaluru shows some alarming observations. When the 8,000 flats are fully occupied, new residents will need 10,662.87 KL of water a day (equivalent of 1780 water tankers of 6000 Litres). More than 19,697 cars will add to Whitefield traffic.

Ministry of Environment and Forests (MoEF) rules make builders of projects of more than 20,000 sqm built up area, apply for an Environmental Clearance (EC) from the state, along with all the other permissions and NOC from BBMP, BWSSB, Karnataka Ground Water Authority (KGWA) to drill borewells prior to construction commencement.

The State Expert Appraisal Committee (SEAC) receives the applications and recommends checks and balances, prior to recommending a project for EC to the State Environment Impact Assessment Authority (SEIAA).

The SEIAA reviews project details, clarifies issues and only then is the EC issued. In cases where construction has begun without an EC, the builder is served with a show cause notice. The KSPCB can file cases against builders under the Environment Protection Act if they proceed with construction without an EC.

Read the rest over at Citizen Matters. 

Great work Josephine!

Crosspost: The Hindu’s Rape Statistics Story

A few weeks ago The Hindu’s Data Blog had a three part series looking at Data on rape cases in Delhi.  It was a powerful story that had a lot of people talking and a good example of what can be done with data available.  Rukmini S has written a piece detailing how she combed through the data to get the story.

Below is an excerpt.

How we put together the statistics that went into our investigation

“Delhi is better than most Indian cities for legal data journalism because it puts all district court judgements online – and promptly – and these can be text-searched. Ideally, I should have been able to scrape all judgements for ‘376’, the IPC section related to rape. However, I encountered a ton of issues that would have rendered a scraping tool useless (as far as I know – if you think there was a way I could have done it, do leave me a comment).

For one, while rape cases are sessions-triable, and so should show up as ‘sessions case” in the nomenclature, for some judges the cases were inexplicably classified as “criminal cases”. Then, while a simple text-search for ‘376’ should have been enough to get me all cases, the text-search function inexplicably collapsed around March 2014. With elections coming up, I had limited time to work on this and had to essentially open every single sessions court judgement and search for ‘376’ in each one. Luckily, the search function revived after two months.”

Read the rest here.

Missed Revolution? Or Too much Hype?

There has been an interesting conversation about this article by Priya Rajasekar “India’s Media — Missing the Data Journalism Revolution?” in Global Investigative Journalism Network.

The thread, which you can find here, is a back and forth about whether India’s journalists are really missing out or if this is a global problem that needs more innovative solutions.

Feel free to add your thoughts to the comments or the thread.

Letter to NIC for a data portal to host public contributed datasets

Sumandro drafted a letter to be sent to NIC regarding the possibility of a data portal to host public contributed datasets, that is datasets originating from both governmental and non-governmental sources, but contributed only by non-governmental agencies and individuals.

We sent that letter to NIC this week. Below is the copy of it.

Letter to NIC for a data portal for public contributed datasets