Data Journalism Workshop #1

Last Sunday, August 31st, Thej and I worked with an Economic Times Journalist Jayadevan PK to design an intro to data journalism workshop. For a while now there has been quite a bit of interest and discussion of data journalism in India. Currently there are a few courses and events around promoting data journalism, we thought there was definitely room to start to build a few modules on working with data for storytelling. Given that we have not done too many of these we decided to do an introduction and leave it limited to a few people.

Datameet1

20140831_103417

You can see the agenda with notes here and the resources we shared on the data journalism resource wiki page, as well as refer to the data catalog that DataMeet has been putting together.

Thanks to Knolby Media for hosting us and for School of Data (I am a fellow). Thank you to Vikras Mishra for volunteering and taking notes, pictures, and video.

We had four story tellers with us, from various backgrounds. We spent the morning doing introduction and what was their experience with data, what their definition of data journalism is and why they wanted to take this workshop. Then we had them put up some expectations so we can gauge what the afternoon should focus on.

 

20140831_155101

We then had Jaya go through the context of data journalism in terms of the world scale and the new digital journalism era.

Then we spent some time going over examples of good data journalism and bad.

After we went through resources people can use to get data. We touched upon the legal issues around using data and copyright issues. Then we discussed accuracy and how to properly attribute sources.

Then we demonstrated a few tools

Datameet 5

Tableau
CartoDB
Scraping tools
Scraper wiki
IMACROS
MapBox
QGIS

Visualization Roadmap
The participants thought understanding how to visualize would be helpful.  So we went through a sort of visualization roadmap.  Then went through stories they were working on to see how we would create a visualization and also how to examine the data and come up with a data strategy for each story.

Datameet 6

20140831_155126

Then showed some more tools to address the suggestions from the exercise.
BHUVAN
Timelines
Odyssey
Fusion Tables
BUMP

Feedback session

Datameet2
People wanted another day to let the lessons be absorbed and some more time to actually have hands on time with the tools.  Also even at the intro level it is important to make people come prepared with stories, so they have something to apply the ideas to.

To say we learned a lot is an understatement. We will definitely be planning more intro workshops and hopefully more advanced workshops in the future, we hope to continue to learn what people think is important and will keep track and see what kinds of stories come out of these learning session.

If you want a particular workshop feel free to request one here.  Stay tuned to the blog and to the list to hear about the next one.

OPEN DATA INDIA WATCH – 10

Stories of the week

Upcoming Events

Happening now.  Hacking the Budget: Data BootCamp in Delhi! Follow it here.

HillHacks in Oct 

a two week long event in Dharamsala, India with participants from all around India and all around the world. It’s organized by CCC and freaklabs as well as participants from MIT Media Lab, Tibetan Children’s Village, Dharamsala International Film Festival, Ghoomakad hackerspace, and many more.”

Week’s Events

We had a DataMeet in Mumbai!

Bangalore Screened the Aaron Swartz Documentary and then discussed the Indian experience.  

GeoBLR discussed the saga of Pincodes.

 

Bangalore: Screening of The Internet’s Own Boy

Last Thursday the Bangalore DataMeet did a screening of the Aaron Swartz Documentary: The Internet’s Own Boy.

Aaron was a developer, technologist, entrepreneur, and a passionate open culture and progressive activist, who had been instrumental in creating Creative Commons and Reddit.  Last year when he took his life in the wake of aggressive prosecution by the US Government, for downloading academic journals through MIT’s network.  The open culture/access/data movement was hit by a great loss but also had to pause and take to understand what the actions taken by the government meant.

We wanted to show the movie here and then have a discussion on the Indian context, can this happen here? Can people who believe in open access be targeted as well?  The group was small at the screening as we spent the evening discussing the THE KARNATAKA PREVENTION OF DANGEROUS ACTIVITIES OF BOOTLEGGERS, DRUG-OFFENDERS, GAMBLERS, GOONDAS, IMMORAL TRAFFIC OFFENDERS AND SLUM-GRABBERS ACT, 1985,  or better known as the Goonda Act (a Goonda is a slang term for gangster.)

The Goonda Acts are basically state level laws that provide a legal definition of what a Goonda is in several situations and prescribes ways the police are allowed to deal with them. The law in Karnataka was enacted in 1985 and had recently been amended to include new provisions including new offenders one being the digital offender.

The 1985 act includes the following:

When the Goonda Act can be invoked?

Explanation.– For the purpose of this clause, public order shall be deemed to have been affected adversely or shall be deemed likely to be affected adversely inter alia if any of the activities of any of the persons referred to in this clause directly or indirectly, is causing or is calculated to cause any harm, danger or alarm or a feeling of insecurity, among the general public or any section thereof or a grave or widespread danger to life or public health.”

What powers does the state have?

3. Power to make orders detaining certain persons.- (1) The State Government may, if satisfied with respect to any bootlegger or drug-offender or gambler or goonda or immoral traffic offender or slum-grabber that with a view to prevent him from acting in any manner prejudicial to the maintenance of public order, it is necessary so to do, make an order directing that such persons be detained.”

When is this Act valid or invalid?

(a) such order shall not be deemed to be invalid or inoperative merely because one
or some of the grounds is or are ,
(i) vague ;
(ii) non-existent ;
(iii) not-relevant ;
(iv) not connected or not proximately connected with such person; or
(v) invalid for any other reason whatsoever ;
and it is not, therefore, possible to hold that the Government or the officer making such
order would have been satisfied as provided in sub-section (1) of section 3 with
reference to the remaining ground or grounds and made the order of detention ;

How long can they detain you?

13. Maximum period of detention.- The maximum period for which any person may be detained, in pursuance of any detention order made under this Act which has been confirmed under section 12 shall be twelve months from the date of detention.

Provided that in a case where no fresh facts have arisen after the revocation or expiry of the earlier detention order made against such person, the maximum period for which such person may be detained in pursuance of the subsequent detention order shall in no case, extend beyond the expiry of a period of twelve months, from the date of detention under the earlier detention order.

How can you address the system for wrongful detention?

16. Protection of action taken in good faith.- No suit, prosecution or other legal proceeding shall lie against the State Government or any officer or person, for anything in good faith done or intended to be done in pursuance of this Act.

This Act gives the state a very powerful tool when it comes to dealing with people that have been deemed Goondas.

The 2014 Amendment added more to the list of potential offenders including people who are suspected of rape and acid attacks.  It also included a Digital Offender

What is a digital offender?

“Any person who knowingly or deliberately violates, for commercial purposes, any copyright law in relation to any book, music, film, software, artistic or scientific work and also includes any person who illegally enters through the identity of another user and illegally uses any computer or digital network for pecuniary gain for himself or any other person or commits any of the offences specified under sections 67, 68, 69, 70, 71, 72, 73, 74 and 75 of the Information Technology Act, 2000”.


Several questions come to mind:

  • How has this act been used in the past?
  • Why was there a push to include digital offenders? In some articles it seems software companies are trying to go after piracy.
  • The definitions are vague and can be used in a lot of instances.  If I send my friend a copy of a song that I have purchased, can I now be taken to jail for 12 months?

“The law applies not only to audio and video pirates, but to Facebook, twitter, Whatsapp users too. Here is how the report explains it : “If govt thinks you are planning to send a ‘lascivious’ photo to a WhatsApp group, or forwarding a copyrighted song, you can be arrested”. – One India

According to the Economic Times there is support for adding digital offender in law enforcement as well as software companies.

The Goonda Act is much more stringent and is expected to bring down the offences considerably , said a police inspector in Bangalore who has dealt with cases of offences under the IT Act. “In future, we are likely to see more offences that are digital in nature. It is probably to effectively deal with such crimes that the government has proposed this amend ment. It is more futuristic in its outlook, and is likely to help Bangalore in a big way,” said the inspector, who did not wish to be identified. According to Naidu, the very mention of the name Goonda Act creates some sort of a fear psychosis among people.

“Right now, many people seem to have a casual attitude to digital offences. If the fear of Goonda Act works, it will not just boost the sale of our products but in the process increase the tax revenues of the government,” he said.

The amendment has bolstered the confidence of Bangalore-based start-ups like MRT Studios. “We do a lot of post-production work for films, and visual effects for films and television. While we provide services to our clients by investing in original software, there are others who do the same work using the pirated software for a fraction of the price that we charge. The fear of police will now force everyone to go for legal software,” said M Naveen Kumar, 31-year old founder of the seven-month old company.

There are two sides to every law and what the Aaron Swartz’s experience shows us is that anything is possible, and that intent is not always taken into account.

How do we make sure that the intent of the law is carried out and that people without malicious intent aren’t being unfairly targeted?

How do we examine the Copyright Act and the IT Act and make sure people understand what they entail and know what  they are allowed and not allowed to do?

You can see the movie at this link.  You can see our notes from the meet up here.

Please feel free to leave a comment or add your thoughts to the hackpad.

Mumbai’s Data Meet Kicks Off With A Bang

Written by Sanjit Oberai
Mumbai’s first Data Meet kicked off on 30th August at the Sardar Patel Institute of Technology with a total of  26 people attending the event.  It started off with a round of introduction by all the attendees which was a mix comprising developers, journalists, students and data enthusiasts. 
There was an introduction by Ritvvij, who is the founder of Pykih, where he spoke about how important it is to have a data group in Mumbai. ( Listen to the recording here
The first talk was by Ajaj Kelkar (above) , who is the  Cofounder http://hansacequity.com  and he gave an introduction to how the recent movement of Open Data started about 5-7 years back in the US as there was  a need to move data from the private space into public space, and this was possible by the the active push seen by transparency groups.  This idea spread and many progressive countries realised that this can be a powerful movement which can be used for public good.   
Data can be used to help take decisions on the social or personal platforms. He highlighted that there are barriers and we need to overcome them.  He explained how many cities abroad have appointed Chief Data Officers whose jobs is to monitor data in each city i.e. municipality budgets, etc. 
An important point also spoken about was on Privacy.  As consumers we are leaving a lot of data out there on social media platforms. However, in the absence of proper laws, we need to be careful about what we put out there and this needs to an area where we need to think carefully about.
The second talk was by Ritvvij, founder of http://www.pykih.com who explained the importance of how one should visualise. He emphasised that a lot of people are not aware of how to use correct tools to visualise data and the most common mistake people make is with the humble PIE CHART. He further went onto explain the process of visualisation mistakes that many news organisations are making today and what they could do to improve their charts/graphs
He also gave examples of his recent work with Firstpost.com where he made custom visualisations for them for the elections and the IPL.  He also worked with narendramodi.in . 
He also spoke about the issues most journalist face when dealing with government data and how difficult it was for them to have access to it.  He ended by stating that there was a dire need for a tool that could become the CMS for data journalism there by allowing journalists to focus on the story rather than doing data janitorial work. 
 
 
The third talk was by Sanjit Oberai, Deputy Editor of IndiaSpend, a non-profit that uses data to tell stories. He spoke about how there are tonnes of stories buried in government data and how to write articles around that. He spoke about how they research articles, what are the sources of data and how one can visualise data using free to use tools like Data Wrapper, Knoema, Tableau, etc.
 He also spoke about a new initiative called Fact Check which can be used to bring about accountability and raise the common man’s awareness. He cited examples about the Goa MLAs who were going to Brazil and how that created quite a stir with the Congress calling this a wasteful expenditure.  A quick factcheck was done to see the assets declared by them in the sworn affidavits provided by the candidates before the legislative elections in 2012.
 
 He also spoke of a Data Room which would be a first its kind resource for students, journalists and researchers that will allow comparison of state wise data like population, health, education, etc.
The last talk was by Srinivas Kodali, an IIT-Madras graduate, who is researching with transport data of cities. He explained how one could scrap data from websites and demoed tools like Selenium for scraping from sites that generate data on the fly using AJAX. However, Selenium is a front-end tool typically used for testing and requires a browser session open. Hence, it cannot be used for large scale scrapping. Then he went on to show case PhantomJS that would allow scrapping  as Selenium would but in a Headless fashion i.e. without the browser UI.
More Pics:
The next meeting with be held at the end of September. Details will be posted on the site soon.

GeoBLR – PIN Code Extravaganza!

Last week at GeoBLR we discussed the issues around PIN codes. The most  important questions were around the processes the postal system and also what are the issues around the availability of reliable spatial data.

Couple of weeks back, Nisha and I started putting together several questions that we would like to get insights on. We used that as the starting point for the discussions. The meat of the problem really is that nobody knows what the processes are and how to get that information.

Prior to GeoBLR, we met some people who are interested in the same issue and clarified a lot of things – for instance, we are now sure that some times a single post office can deal with more than one PIN code.

To get a sense how people felt about the PIN codes issues, we asked around. Some people don’t bother to use PIN codes for any substantial service other than sending post cards.  As long as we are not able to tie PIN codes to geographic locations reliably, it’s not so useful.  Everybody agrees that it has immense potential just because it’s the only part of the address that everybody gets right (most of the time).

We also started to brainstorm how to come up with a plan so that a group like ours along with several other partners could work together to attempt to crowdsource the issue. Read more about the plan and next steps here!

20140821_191027 20140821_191035

Open Data India Watch – 9

Stories

Tools/Tutorials

Events

Screening: The Internet’s Own Boy – Aaron Swartz Documentary and discussions

All of us on DataMeet group are aware of Aaron Swartz. So I thought it would be a great idea to watch it together and share ideas. Hence we are screening the movie coming thursday. Please make it if possible. RSVP on the meetup page. If you can’t make it. The movie is on archive.org for everyone to download and watch. You can also visit the movie project page for more details.

http://www.meetup.com/DataMeet/events/201596972/

Crosspost: The Hindu’s Rape Statistics Story

A few weeks ago The Hindu’s Data Blog had a three part series looking at Data on rape cases in Delhi.  It was a powerful story that had a lot of people talking and a good example of what can be done with data available.  Rukmini S has written a piece detailing how she combed through the data to get the story.

Below is an excerpt.

How we put together the statistics that went into our investigation

“Delhi is better than most Indian cities for legal data journalism because it puts all district court judgements online – and promptly – and these can be text-searched. Ideally, I should have been able to scrape all judgements for ‘376’, the IPC section related to rape. However, I encountered a ton of issues that would have rendered a scraping tool useless (as far as I know – if you think there was a way I could have done it, do leave me a comment).

For one, while rape cases are sessions-triable, and so should show up as ‘sessions case” in the nomenclature, for some judges the cases were inexplicably classified as “criminal cases”. Then, while a simple text-search for ‘376’ should have been enough to get me all cases, the text-search function inexplicably collapsed around March 2014. With elections coming up, I had limited time to work on this and had to essentially open every single sessions court judgement and search for ‘376’ in each one. Luckily, the search function revived after two months.”

Read the rest here.

Open Data India Watch – 7

Stories

Stories – World

Events

  • GeoBlr – Meetup – PINCODE EXTRAVAGANZA! is on Aug 12. Please do RSVP.
  • Monthly DataMeetUp Delhi is on Aug 27
  • DataBootCamp India -Hack the Budget at the capital’s biggest data journalism event yet, between September 5-7. The Open Data Bootcamp is co-hosted by the International Center for Journalists, the Hindustan Times, Hacks/Hackers New Delhi, Data{Meet}, and the 9.9 School of Communication. The three-day event will take place at the Bridge School of Management at Cyber City in Gurgaon.

Suggest stories to us

You can suggest stories, applications, events etc by tweeting at @datameet or by emailing.

DataMeet is a community of Data Science and Open Data enthusiasts.