All posts by Nisha Thompson

Meet a DMer: Siddharth Desai

SidPhoto

Meet a DMer.

On the DataMeet list we have started referring to each other as DMers.  So I wanted to start highlighting people who are pretty interesting and have a great insights into open data.

Siddharth Desai is one of our super volunteers, he is steadfast in his commitment to helping out with Open Data  Camps and coming to any event in Bangalore that he can.  I was really happy to interview him and learn about why open data is such an interest to him.

Where are you from? What do you do?

I am from a town in Goa called Vasco-da-gama. Moved to Bangalore 10 years ago for professional reasons. Currently, I am working as a Software Architect with Nokia(formerly NSN). My job involves building solutions in the telecom domain. I do quite a bit of data analysis and visualization as part of my work. The type of data involved is mostly engineering and planning related data.

How did you find out about DataMeet?

I have been following the Open Data Movement for some time now. I realized there were some interesting things happening here in India when I saw the event notification for the first Open Data Camp in 2012. That’s when I heard about the DataMeet and have been on the list ever since.

Do you believe in open data? and why?

I believe in open data. It’s simply a great leveler. For most part of human history, the masses have been fooled and controlled because they didn’t have access to information that a select few did. Then came along Gutenberg who invented the printing press. Suddenly, knowledge could get out of the confines of a few and into the hands of many. And that empowered people and eventually led to greater equity.

The Internet and Wikipedia have done something similar in our times. The Open Data movement is another (huge) step forward in putting an end to all un-necessary information asymmetry.

What do you hope to learn? Contribute?

As part of my work, I have acquired the skills for making sense of complex data sets. I am hoping to put those skills to good use by contributing to any initiative that requires support.

Everytime I am at a data meet or data camp, I get to learn so much about life – about challenges in different non technical areas of data, like social and political contexts around data and information.

What is your impression of the datameet community?

Where else do people from such a diverse background meet. We have Academics and Hackers, NGOs and Bureaucrats, Journalists and Businessmen, Designers and more. With such an impressive line-up , there is huge potential to make an impact.

What kind of civic projects do you work on? What kinds of civic projects are you interested in working on?

Really anything that does good. Particularly, if anyone has any ideas in medical or healthcare spaces, I’d be glad to join. I’ve noticed during various illnesses in the family, that a lot of information on treatment efficacy, side effects, doctor/hospital failures, is shrouded in secrecy. This really needs to be available openly to all for closer scrutiny.

Share a visualization that you saw recently that made a big impression? Share an article you have read recently that made a big impression? (does not have to be data related)

There is this visualization by David McCandless that I love (partly because I enjoy sci-fi a lot).  It visualizes time travel in popular films and tv series. The approach to displaying a non-linear timeline is pretty creative.

Tool Review: WebScraper

Usually when I have any scraping to do I ask Thej  if he can do it and then take a nap. However, Thej is on vacation so I was stuck either waiting for him to come back or I could try to do it myself. It was basic text, not much html, no images, and a few pages, so I went for it with some non coder tools.

I checked the School of Data scraping section for some tools and they have a nice little section on using browser based scraping tools. I did a chrome store search and came across WebScraper.

I glanced through the video sort of paying attention got the gist of it and started to play with the tool.  It took awhile for me to figure out.  I highly recommend very carefully going through the tutorials.  The videos take you through the process but are not very clear for complete newbies like me so it took a few views to understand the hierarchy concept and how to adapt their example to the site I was scraping.

I got the hang of doing one page and then figuring out how to tell it to go to another page, again I had to spend quite a bit of time rewatching the tutorial.

At the end of the day I got the data in neat columns in CSV without too much trouble.  I would recommend WebScraper for people who want to do some basic scraping.

It is as visual as you can get though the terminology is still very technical.   You have to do into the developer tools folder which can feel intimidating but ultimately satisfying in the end.

Though I’ll probably still call Thej.

Project Data Playlist

Finding ways to learn a new way to play and work with data is always a challenge. Workshops, courses, and sprints are a really great way to learn from people. While we will continue to try to bring those events to places around India we wanted to use different mediums to put up lessons, tips, techniques and tools.

There is also an additional challenge of how do we reach out to new communities and people, with different languages and ways of presenting concepts and skills.

We wanted to invite the community and others to experiment in this space by creating video skill sharing playlists.

So instead of a single 10 minute video on how to use Excel we are asking people to create playlists of videos that are between 2 to 5 minutes long that are one concept or process each video.

Anand S presents our first playlist: Formatting in EXCEL:

By breaking up the lesson into chunks and making them separate videos we are asking people add their own.

Don’t like excel? Do one for Open Spreadsheets or Fusion Tables.  Sharing your favorite tools and tricks used for working with data is the main goal of this project.

The next step is translating them into a different languages and offering different ways to teach a concept.

Next week Thej will present a intro to SQL video.

If you want to do one there a few rules:

1) Introduce yourself
2) Break up the lesson by technique and make each video no more than 2 to 5 minutes.
3) Make sure they are a playlist.
4) Upload them to youtube and tag them DataMeet
5) Let us know!

If you have any feedback or a video request please feel free to leave it in the comments. We will hopefully release 2 playlists every month.

Crosspost: Adding stress to a stressed area!

A few weeks ago we held an Intro to Data Journalism Workshop.  Josephine Joseph was in attendance, she regularly writes for Citizen Matters, Bangalore’s local paper that knows all.  She was working on this story and has published it last week with Citizen Matters, I’m very happy to crosspost it here as a great example of local data journalism.  

26 projects could: add 19,000 cars to Whitefield traffic, up water demand by 10.5 million litres

East Bangalore area, particularly Whitefield- KR Puram – Mahadevapura area, is on the prime real estate map. What are the projects coming up next? What are the implications?

Investing in real estate in Bangalore is a dream of any investor. However, is the growth of this sector in tune with the infrastructure that the city can handle?

A close look by Citizen Matters at 26 constructions coming up in Whitefield – KR Puram area in East Bengaluru shows some alarming observations. When the 8,000 flats are fully occupied, new residents will need 10,662.87 KL of water a day (equivalent of 1780 water tankers of 6000 Litres). More than 19,697 cars will add to Whitefield traffic.

Ministry of Environment and Forests (MoEF) rules make builders of projects of more than 20,000 sqm built up area, apply for an Environmental Clearance (EC) from the state, along with all the other permissions and NOC from BBMP, BWSSB, Karnataka Ground Water Authority (KGWA) to drill borewells prior to construction commencement.

The State Expert Appraisal Committee (SEAC) receives the applications and recommends checks and balances, prior to recommending a project for EC to the State Environment Impact Assessment Authority (SEIAA).

The SEIAA reviews project details, clarifies issues and only then is the EC issued. In cases where construction has begun without an EC, the builder is served with a show cause notice. The KSPCB can file cases against builders under the Environment Protection Act if they proceed with construction without an EC.

Read the rest over at Citizen Matters. 

Great work Josephine!

Notes from first Data BootCamp India

This has been crossposted from Thej GN’s blog.

“First ever DataBootCamp in India was organized by ICFJ in collabaration with Data{Meet}, HT, Hacks/Hackers – New Delhi, 9.9 School of Journalism in Delhi. It was a three-day event hosted by Bridge School of Management. It was an interesting gathering as more than 50% were from journalistic background. I have never seen such a big group of journalists in one place for three days, working in groups with people of different backgrounds.

Major part of the camp was to propose projects/stories and work on them. Group selected ten projects out of all the proposed projects. I have listed the projects below, hyperlinking to end results. If you like to see all the proposed projects then go to HackDash.

dbootcamp

  1. Narendra Modi On Twitter Vs Other Global Leaders – Word Play vs Ground Reality
  2. Crime Agaisnt Women In India
  3. Class Calculator – Think you’re in the middle class? Use the class calculator. Scroll down to find out. You may be surprised. Or Not.
  4. Cashless In India – Is India becoming a #cashlesseconomy?
  5. Terror Statistics
  6. Money poured into Ganga vs pollution levels
  7. India’s Supreme Court Ruling on Under-Trial Prisoners
  8. Media Ownership
  9. Advertising For Online Video To Rise By 30%
  10. Build Hospitals To Kill Cancer

Of course we had hands-on workshops on scraping, data cleaning, data visualization and mapping. I will probably need a series of posts to cover them all here. I have put the relevant links at the bottom for you to explore. Best part was some of the participants used the tools they learnt during the camp for their project work.

Other Interesting facts/links/tools that i came across during the event:

Overall I was surprised at the quality of the projects. At least half of them were executed very well. Two days are actually very small amount of time, so hats off to all the participants. As a participant and duct-tape programmer/trainer I had lots of fun. I hope there will be more collaborations between tech and journalism community in future.”

See Thej’s post for more pictures.  Also if you were at the event and have a post please let us know!

 

Data Journalism Workshop #1

Last Sunday, August 31st, Thej and I worked with an Economic Times Journalist Jayadevan PK to design an intro to data journalism workshop. For a while now there has been quite a bit of interest and discussion of data journalism in India. Currently there are a few courses and events around promoting data journalism, we thought there was definitely room to start to build a few modules on working with data for storytelling. Given that we have not done too many of these we decided to do an introduction and leave it limited to a few people.

Datameet1

20140831_103417

You can see the agenda with notes here and the resources we shared on the data journalism resource wiki page, as well as refer to the data catalog that DataMeet has been putting together.

Thanks to Knolby Media for hosting us and for School of Data (I am a fellow). Thank you to Vikras Mishra for volunteering and taking notes, pictures, and video.

We had four story tellers with us, from various backgrounds. We spent the morning doing introduction and what was their experience with data, what their definition of data journalism is and why they wanted to take this workshop. Then we had them put up some expectations so we can gauge what the afternoon should focus on.

 

20140831_155101

We then had Jaya go through the context of data journalism in terms of the world scale and the new digital journalism era.

Then we spent some time going over examples of good data journalism and bad.

After we went through resources people can use to get data. We touched upon the legal issues around using data and copyright issues. Then we discussed accuracy and how to properly attribute sources.

Then we demonstrated a few tools

Datameet 5

Tableau
CartoDB
Scraping tools
Scraper wiki
IMACROS
MapBox
QGIS

Visualization Roadmap
The participants thought understanding how to visualize would be helpful.  So we went through a sort of visualization roadmap.  Then went through stories they were working on to see how we would create a visualization and also how to examine the data and come up with a data strategy for each story.

Datameet 6

20140831_155126

Then showed some more tools to address the suggestions from the exercise.
BHUVAN
Timelines
Odyssey
Fusion Tables
BUMP

Feedback session

Datameet2
People wanted another day to let the lessons be absorbed and some more time to actually have hands on time with the tools.  Also even at the intro level it is important to make people come prepared with stories, so they have something to apply the ideas to.

To say we learned a lot is an understatement. We will definitely be planning more intro workshops and hopefully more advanced workshops in the future, we hope to continue to learn what people think is important and will keep track and see what kinds of stories come out of these learning session.

If you want a particular workshop feel free to request one here.  Stay tuned to the blog and to the list to hear about the next one.

OPEN DATA INDIA WATCH – 10

Stories of the week

Upcoming Events

Happening now.  Hacking the Budget: Data BootCamp in Delhi! Follow it here.

HillHacks in Oct 

a two week long event in Dharamsala, India with participants from all around India and all around the world. It’s organized by CCC and freaklabs as well as participants from MIT Media Lab, Tibetan Children’s Village, Dharamsala International Film Festival, Ghoomakad hackerspace, and many more.”

Week’s Events

We had a DataMeet in Mumbai!

Bangalore Screened the Aaron Swartz Documentary and then discussed the Indian experience.  

GeoBLR discussed the saga of Pincodes.

 

Bangalore: Screening of The Internet’s Own Boy

Last Thursday the Bangalore DataMeet did a screening of the Aaron Swartz Documentary: The Internet’s Own Boy.

Aaron was a developer, technologist, entrepreneur, and a passionate open culture and progressive activist, who had been instrumental in creating Creative Commons and Reddit.  Last year when he took his life in the wake of aggressive prosecution by the US Government, for downloading academic journals through MIT’s network.  The open culture/access/data movement was hit by a great loss but also had to pause and take to understand what the actions taken by the government meant.

We wanted to show the movie here and then have a discussion on the Indian context, can this happen here? Can people who believe in open access be targeted as well?  The group was small at the screening as we spent the evening discussing the THE KARNATAKA PREVENTION OF DANGEROUS ACTIVITIES OF BOOTLEGGERS, DRUG-OFFENDERS, GAMBLERS, GOONDAS, IMMORAL TRAFFIC OFFENDERS AND SLUM-GRABBERS ACT, 1985,  or better known as the Goonda Act (a Goonda is a slang term for gangster.)

The Goonda Acts are basically state level laws that provide a legal definition of what a Goonda is in several situations and prescribes ways the police are allowed to deal with them. The law in Karnataka was enacted in 1985 and had recently been amended to include new provisions including new offenders one being the digital offender.

The 1985 act includes the following:

When the Goonda Act can be invoked?

Explanation.– For the purpose of this clause, public order shall be deemed to have been affected adversely or shall be deemed likely to be affected adversely inter alia if any of the activities of any of the persons referred to in this clause directly or indirectly, is causing or is calculated to cause any harm, danger or alarm or a feeling of insecurity, among the general public or any section thereof or a grave or widespread danger to life or public health.”

What powers does the state have?

3. Power to make orders detaining certain persons.- (1) The State Government may, if satisfied with respect to any bootlegger or drug-offender or gambler or goonda or immoral traffic offender or slum-grabber that with a view to prevent him from acting in any manner prejudicial to the maintenance of public order, it is necessary so to do, make an order directing that such persons be detained.”

When is this Act valid or invalid?

(a) such order shall not be deemed to be invalid or inoperative merely because one
or some of the grounds is or are ,
(i) vague ;
(ii) non-existent ;
(iii) not-relevant ;
(iv) not connected or not proximately connected with such person; or
(v) invalid for any other reason whatsoever ;
and it is not, therefore, possible to hold that the Government or the officer making such
order would have been satisfied as provided in sub-section (1) of section 3 with
reference to the remaining ground or grounds and made the order of detention ;

How long can they detain you?

13. Maximum period of detention.- The maximum period for which any person may be detained, in pursuance of any detention order made under this Act which has been confirmed under section 12 shall be twelve months from the date of detention.

Provided that in a case where no fresh facts have arisen after the revocation or expiry of the earlier detention order made against such person, the maximum period for which such person may be detained in pursuance of the subsequent detention order shall in no case, extend beyond the expiry of a period of twelve months, from the date of detention under the earlier detention order.

How can you address the system for wrongful detention?

16. Protection of action taken in good faith.- No suit, prosecution or other legal proceeding shall lie against the State Government or any officer or person, for anything in good faith done or intended to be done in pursuance of this Act.

This Act gives the state a very powerful tool when it comes to dealing with people that have been deemed Goondas.

The 2014 Amendment added more to the list of potential offenders including people who are suspected of rape and acid attacks.  It also included a Digital Offender

What is a digital offender?

“Any person who knowingly or deliberately violates, for commercial purposes, any copyright law in relation to any book, music, film, software, artistic or scientific work and also includes any person who illegally enters through the identity of another user and illegally uses any computer or digital network for pecuniary gain for himself or any other person or commits any of the offences specified under sections 67, 68, 69, 70, 71, 72, 73, 74 and 75 of the Information Technology Act, 2000”.


Several questions come to mind:

  • How has this act been used in the past?
  • Why was there a push to include digital offenders? In some articles it seems software companies are trying to go after piracy.
  • The definitions are vague and can be used in a lot of instances.  If I send my friend a copy of a song that I have purchased, can I now be taken to jail for 12 months?

“The law applies not only to audio and video pirates, but to Facebook, twitter, Whatsapp users too. Here is how the report explains it : “If govt thinks you are planning to send a ‘lascivious’ photo to a WhatsApp group, or forwarding a copyrighted song, you can be arrested”. – One India

According to the Economic Times there is support for adding digital offender in law enforcement as well as software companies.

The Goonda Act is much more stringent and is expected to bring down the offences considerably , said a police inspector in Bangalore who has dealt with cases of offences under the IT Act. “In future, we are likely to see more offences that are digital in nature. It is probably to effectively deal with such crimes that the government has proposed this amend ment. It is more futuristic in its outlook, and is likely to help Bangalore in a big way,” said the inspector, who did not wish to be identified. According to Naidu, the very mention of the name Goonda Act creates some sort of a fear psychosis among people.

“Right now, many people seem to have a casual attitude to digital offences. If the fear of Goonda Act works, it will not just boost the sale of our products but in the process increase the tax revenues of the government,” he said.

The amendment has bolstered the confidence of Bangalore-based start-ups like MRT Studios. “We do a lot of post-production work for films, and visual effects for films and television. While we provide services to our clients by investing in original software, there are others who do the same work using the pirated software for a fraction of the price that we charge. The fear of police will now force everyone to go for legal software,” said M Naveen Kumar, 31-year old founder of the seven-month old company.

There are two sides to every law and what the Aaron Swartz’s experience shows us is that anything is possible, and that intent is not always taken into account.

How do we make sure that the intent of the law is carried out and that people without malicious intent aren’t being unfairly targeted?

How do we examine the Copyright Act and the IT Act and make sure people understand what they entail and know what  they are allowed and not allowed to do?

You can see the movie at this link.  You can see our notes from the meet up here.

Please feel free to leave a comment or add your thoughts to the hackpad.

Crosspost: The Hindu’s Rape Statistics Story

A few weeks ago The Hindu’s Data Blog had a three part series looking at Data on rape cases in Delhi.  It was a powerful story that had a lot of people talking and a good example of what can be done with data available.  Rukmini S has written a piece detailing how she combed through the data to get the story.

Below is an excerpt.

How we put together the statistics that went into our investigation

“Delhi is better than most Indian cities for legal data journalism because it puts all district court judgements online – and promptly – and these can be text-searched. Ideally, I should have been able to scrape all judgements for ‘376’, the IPC section related to rape. However, I encountered a ton of issues that would have rendered a scraping tool useless (as far as I know – if you think there was a way I could have done it, do leave me a comment).

For one, while rape cases are sessions-triable, and so should show up as ‘sessions case” in the nomenclature, for some judges the cases were inexplicably classified as “criminal cases”. Then, while a simple text-search for ‘376’ should have been enough to get me all cases, the text-search function inexplicably collapsed around March 2014. With elections coming up, I had limited time to work on this and had to essentially open every single sessions court judgement and search for ‘376’ in each one. Luckily, the search function revived after two months.”

Read the rest here.

Missed Revolution? Or Too much Hype?

There has been an interesting conversation about this article by Priya Rajasekar “India’s Media — Missing the Data Journalism Revolution?” in Global Investigative Journalism Network.

The thread, which you can find here, is a back and forth about whether India’s journalists are really missing out or if this is a global problem that needs more innovative solutions.

Feel free to add your thoughts to the comments or the thread.