Category Archives: How-To

Making A Football Data Viz With D3 and Reveal.js

This is a write-up on how I made a slideshow for the Under-17 World Cup.

The U-17 World Cup is the first-ever FIFA tournament to be hosted by India. Like many of you, I’ve seen plenty of men’s World Cups, but never an U-17 one. To try and understand how the U-17 tournament might be different from the ‘senior’ version, I compared data from the last U-17 World Cup held in Chile in 2015 and the last men’s World Cup in Brazil in 2014.

The data was taken from Technical Study Group reports that are published by FIFA after every tournament. (The Technical Study Group is a mixture of ex-players, managers and officials associated with the game. You can read more about the group here.)

In particular, I used the reports for the 2014 World Cup and the 2015 U-17 World Cup. The data was taken pretty much as is, and thankfully didn’t have to be processed much. An example of the data available in the report can be seen in the image below. It shows how the 171 goals in the 2014 World Cup came about.

A look at some of the data in the report

The main takeaway from the comparison with the men’s World Cup is that the U-17 World Cup might see more goals and fewer 0-0 draws on average. The flipside is that there could be more cards and penalties too. For more details, check the slideshow.

BE LESS INTIMIDATING FOR READERS

I know just using one World Cup each to represent men’s and U-17 football may not be particularly rigorous. We could have also used data from the previous three or four World Cups in each age format. But if I did that, I was scared the data story would become more dense and intimidating for readers. I wanted to make this easy to follow along and understand, which is why I simplified things this way.

A card from the slideshow

Another thing I did to make this easier to digest was to stick to one main point per card (see image above). The main point is in the headline, then you get a few lines of text below showing how exactly you’ve arrived at the main point. The figures that have been calculated and compared are put in a bold font. Then there is an animated graphic below that, which visually reinforces the main point of the slide.

The data story tries to simulate a card format, one that you can just flick through on the mobile. I used the slideshow library reveal.js to make the cards. But I suspect there is a standard, more established method that mobile developers have to create a card format, will have to look into this further.

The animations were done with D3.js, with help from a lot of examples on stackoverflow and bl.ocks.org. If you’re new to D3 and want to know how these animations were done, here’s more info.

ANIMATING THE BAR CHART

The D3 ‘transitions’ or animations in this slideshow are basically the same. There’s (a) an initial state where there’s nothing to see, (b) the final state where the graphic looks the way you want and (c) a transition from the initial state to the final state over a duration specified in milliseconds.

A snippet of code for animating the bars

For example, in the code snippet for the bar animation above, you see two attributes changing for the bars during the transition—the ‘height’ and ‘y’ attributes changing over a duration of 500 milliseconds. You can see another example of this animation at bl.ocks.org here.

ANIMATING THE STACKED BAR CHART

This animation was done in a way similar to the one above. The chart is called a ‘normalised stack chart’ and the code for this was taken from the bl.ocks.org example here.

The thing about this chart is that you don’t have to calculate the percentages beforehand. You just feed in the raw data (see image below) and you get the final percentages visualised in the graphic.

The raw data on goals gets converted to percentages

ANIMATING THE LINE CHART

The transition over here isn’t very sophisticated. In this, the two lines and the data points on them are basically set to appear 300 milliseconds and 800 milliseconds respectively after the card appears on screen (see the code snippet below).

A snippet of code for changing the opacity of the line

A cooler line animation would have been ‘unrolling’ the line as seen in this bl.ock.org example. Maybe next time!

ANIMATING THE PIE CHART

Won’t pretend to understand the code used here. I basically just adapted this example from bl.ocks.org and played around with the parameters till it came out the way I wanted. This example is from Mike Bostock, the creator of D3.js, and in it he explains his code line by line (see image below). Do look at it if you want to fully understand how this pie chart animation works.

Commented code from Bostock

ANIMATING THE ISOTYPE CHART

Yup, this chart is called an isotype chart. This animation is another one where the transition uses delays. So if you look in the gif, you see on the left side three cards being filled one after the other.

Some of the code used in animating this isotype chart

They all start off with an opacity of 0, which makes them invisible (or transparent, technically). What the animation does is make each of the cards visible by changing the opacity to 1 (see image above). This is done after different delay periods of 200 milliseconds for the bottom card, 400 for the card in the middle and 600 milliseconds for the card on top.

FINAL WORD

If you’ve never worked with D3 before, hope this write-up encourages you to give it a shot. You can look at all the code for the slideshow in the github repo here. All comments and feedback are welcome! 🙂

COVER IMAGE CREDIT: Made in inkscape with this picture from Flickr

How to Make an Election Interactive

So I created an interactive for Wionews.com (embedded below) on the assembly elections taking place in five states. This write-up goes into how I did the interactive and the motivations behind it.


The Interactive is embedded below. Click on Start to begin.


The interactive looks at three things:

  • where each party won in the last assembly election in 2012 in each of the five states, visualised with a map.
  • where each party won in the last Lok Sabha (LS) election in 2014, if the LS seats were broken up into assembly seats. This was also done with a map.
  • the share of seats won by each major party in previous assembly elections, done with a line chart.

I got all my data from the Election commission website and the Datameet repositories, specifically the repositories with the assembly constituency shapefiles and historical assembly election results.

Now these files have a lot of information in them, but since I was making this interactive specifically for mobile screens and there wouldn’t be much space to play with, I made a decision to focus just on which party won where.

As mundane as that may seem, there’s still some interesting things you get to see. For example, from the break-up of the 2014 Lok Sabha results, you find out where the Aam Aadmi Party has gained influence in Punjab since the last assembly elections in 2012, when they weren’t around.

The interactive page on the AAP in Punjab, 2014
The interactive page on the AAP in Punjab, 2014

ANALYSING THE DATA

While I got the 2012 election results directly from the election commission’s files, the breakdown of the 2014 Lok Sabha results by assembly seat needed a little more work with some data analysis in python (see code below) and manual cross-checking with other election commission files.

Some of the python code used to break down the 2014 LS results by assembly seat.
Some of the python code used to break down the 2014 LS results by assembly seat. You can see all of it here.

For calculating the percentages of seats won by major parties in the past, I had to do some analysis in python of Datameet’s assembly election results file.

Some of the python code used to calculate historical seat shares of parties.
Some of the python code used to calculate historical seat shares of parties. You can see all of it here.

PUTTING IT ALL ONTO A MAP

The next thing to do was put the data of which party won where onto an assembly seat map for each state.

To get the assembly seat maps, I downloaded the assembly constituency shapefile from the datameet repository and used the software QGIS to create five separate shapefiles for each of the states. (Shapefiles are what geographers and cartographers use to make maps.)

A screenshot of the <a href="https://www.qgis.org" target="_blank">QGIS</a> software separating the India shapefile into separate ones for the states.
A screenshot of the QGIS software separating the India shapefile into separate ones for the states.

The next task is to make sure the assembly constituency names in the shapefiles match the constituency names in the election results. For example, in the shapefile, one constituency in Uttar Pradesh is spelt as Bishwavnathganj while in the election results, it’s spelt as Vishwanathganj. These spellings need to be made consistent for the map to work properly.

I did this with the OpenRefine software which has a lot of inbuilt tools to detect and correct these kinds of inconsistencies.

The purist way would have been to do all this with code, but I’ve been using OpenRefine, a graphical tool, for a while now and it’s just easier for me this way. Please don’t judge me! (Using graphical tools such as OpenRefine and QGIS make it harder for others to reproduce your exact results and is less transparent, which is why purists look down on a workflow that is not entirely in code.)

After the data was cleaned, I merged or ‘joined’ the 2012 and 2014 election results with the shapefile in QGIS, I then converted the shapefile into the geojson format, which is easier to visualise with javascript libraries such as D3.js.

I then chose the biggest three or four political parties in the 2012 assembly and 2014 LS election results for each state, and created icons for them using the tool Inkscape. This can be done by tracing the party symbols available in various election commission documents.

Some of the party icons designed for the interactive
Some of the party icons designed for the interactive

HOW IT’S ALL VISUALISED

The way the interactive would work is if you click on the icon for a party, it downloads the geojson file which, to crudely put it, has the boundaries of the assembly seats and the names of the party that’s won each seat.

The interactive map showing the NPF in Manipur in 2014
The interactive map showing the NPF in Manipur in 2014

You then get a map with the seats belonging to that party coloured in yellow. And each time you click on a different party icon, a new map is generated. (If I’ve understood the process wrong, do let me know in the comments!)

Here’s some of the d3 code used:

    map2
        .append("svg:image")  //put an image onto the canvas
        .attr("xlink:href","../exp_buttons/bharatiya_janta_party_75.png")  //take the image from the exp_buttons folder
        .attr('height', '75')
        .attr('width', '75')
        .attr('class','shadow partyButton')
        .attr('id','bjpButton')
        .attr("x", 30)             
        .attr("y", 0)    
        .on("click", function(){
            map
              .append("svg:g")         //create the map
              .style("fill","#4f504f")  //fill the map with this black color
              .selectAll("path")
              .data(json.features)
              .enter()
              .append("path")
                  .attr("d", pathx)
                  .style("stroke", "#fdd928")  //create yellow borders
                  .style("opacity","1")
                  .style("stroke-width", "1")
                  .style("fill",colorParty);      //colorparty is determined by the function below

		 //fill the seats with yellow if they were won by the “Bharatiya Janta Party”
		//and if they were won by someone else, make them black
					                
                function colorParty(d) {
                   if (d.properties.uttarakhand_2012_2012_1 == "Bharatiya Janta Party") {
                      return "#fdd928"
                } else {
                      return "#4f504f";
                    }
                };
              });

I won’t go into the nitty gritty of how the line chart works, but essentially every time you click on one of these icons, it changes the opacity of the line representing the party into 1 making it visible while the opacity of every other line is reduced to 0 making them invisible.

The historical performance of the MGP in Goa.
The historical performance of the MGP in Goa.

Here’s some of the relevant d3 code:

svg
	.append("svg:image")                                                             //this tells D3 to put an image onto the canvas
	.attr("xlink:href","../exp_buttons/bharatiya_janta_party_75.png")   //and this will be the bjp image located in the exp_buttons folder
	.attr('height', '75')
	.attr('width', '75')
	.attr('class','shadow partyButton')       //this is what gives a button the shadow, attributes derived from css 
	.attr('id','bjpButton')			     
	.attr("x", 0)             
	.attr("y", height + margin.top + 20)    
	.on("click", function(){
			d3.selectAll(".line:not(.bjpLine)").style("opacity", "0");  //make all other lines invisible
			d3.selectAll(".bjpLine").style("opacity", "1");                   //make the BJP line visible
			d3.select(this).classed({'shadow': false});		//remove the drop shadow from the BJP button 
											//so that people know it’s active
			d3.selectAll('.partyButton:not(#bjpButton)').classed({'shadow': true});  //this puts a drop shadow onto other buttons
													   //in case they were active
			
			});

I then put everything into a repository on Github and used Github pages to ‘serve’ the interactive to users.

Now I haven’t gone into the complexity of much of what’s been done. For example, if you see those party symbols and the tiny little shadows under them (they’re called drop shadows), it took me at least two days to make that happen.

It took two days to get these drop shadows!
It took two days to get these drop shadows!

MOTIVATIONS BEHIND THE INTERACTIVE

As for the design, I wanted something that people would just click/swipe through, that they wouldn’t have to scroll through, and also limit the data on display, giving only as much as someone can absorb at a glance.

My larger goal was to try and start doing data journalism that’s friendlier and more approachable than the stuff I’ve been doing in the past such as this blogpost on the Jharkhand elections.

I actually read a lot on user interface design, after which I made sure that the icons people tap on their screen are large enough for their thumbs, that icons were placed in the lower half of the screen so that their thumbs wouldn’t have to travel as much to tap on them, and adopted flat design with just a few drop shadows and not too many what-are-called skeumorphic effects.

Another goal was to allow readers to get to the information they’re most interested in without having to wade through paras of text by just tapping on various options.

The sets of options available to the user while in the interactive
The sets of options available to the user while in the interactive

I hacked a lot of D3.js examples on bl.ocks.org and stackoverflow.com to arrive at the final interactive, I’m still some way away from writing d3 code from scratch, but I hope to get there soon.

Because I’m not a designer, web developer, data scientist or a statistician, I may have violated lots of best practices in those fields. So if you happen to come across some noobie mistake, do let me know in the comments, I’m here to learn, thanks! 🙂


Shijith Kunhitty is a data journalist at WION and former deputy editor of IndiaSpend. He is an alumnus of Washington University, St. Louis and Hindu College, Delhi.

Guide on Digitizing Static Maps

I was recently invited to Nagpur by a group called Center for Peoples Collective, to brainstorm doing for Nagpur the kind of things I’ve done in Pune for budget data processing/viz and mapping. We found that they didn’t have any digital data (ie, shapefile, kml etc) of Nagpur’s electoral wards, but they did have some high-res images released by Nagpur Municipal Corporation (NMC) with the boundaries marked. So I walked them through a process that I’ve worked out, which uses free online services and doesn’t need any software or advanced skills to do. I’m sharing that process here.
Continue reading Guide on Digitizing Static Maps

Sikkim

#LATEPOST

Sikkim State Government passed an open data policy Sikkim Open Data Acquisition and Accessibility Policy in 2014. With pushing from the Chief Minister and Member of Parliament the Honorable Prem Das Rai they turned to open data to take control of the state’s data. The Honorable Mr PD Rai has repeatedly mentioned is the lack of access to government information on demand. It is not uncommon for lawmakers to ask questions only to have to wait a day or more for the answer and lose a moment to use that information for decision making.

An Open Data for Human Development Workshop was organized by the International Centre for Human Development of UNDP India, with the Centre for Internet and Society, AKVO, Mapbox and DataMeet co-facilitating the event in Bangalore last June. The aim was to bring together members of the Sikkim government, IT professionals, and open data enthusiasts.

20150416_124455

In April before the workshop Sumandro (CIS) and I went to Sikkim to have a pre consultation with the Sikkim government on how to prepare for the large workshop in Bangalore. We met with the MP and the heads of the Rural Development, Health, and IT departments to discuss their plans to implement their open data policy. Then there was a large meeting with all the departments and the MP. We presented different things you can do when data is opened and offered suggestions for how to implement the policy. 20150416_123613The departments took turns discussing their issues regarding implementation; concerns like server space, technology needs, how to create incentives to accurate and timely data uploading were shared.

We presented things for them to think about in a preparation for the June event and for how to work with the open data community in India.

In June the workshop was held as NIAS. Thej gave a session on data tools that can be used to assemble, clean, analyze, publish and visualize data. Some of the tools that he introduced and used during the workshop are

  • Tabula Its difficult to extract data from PDFs. But Tabula allows you to extract that data into a CSV or Microsoft Excel spreadsheet using a simple, easy-to-use interface. Tabula works on Mac, Windows and Linux.
  • Open Refine – is a powerful tool for working with messy data: cleaning it; transforming it from one format into another; extending it with web services; and linking it to databases like Freebase.
  • DataWrapper allows you to create powerful charts very easily.
  • CartoDB is the Easiest Way to Map and Analyze Your Location Data

“Overall interaction was great. Delegates from Sikkim were very interested in DataMeet community and work we do as community. Some part of the workshop was used to introduce the community aspect of Data.”

You can see the full notes of the event at Centre for Internet and Society’s blog.

We are looking forward to see Sikkim be the first state to implement an open data portal using the Data.Gov.In platform.

To Hack or Not to Hack….

Hackathons are a source of confusion and frustration for us. DataMeet actively does not do them unless there is a very specific outcome the community wants like freeing a whole dataset or introducing open data to a new audience. We feel that they cause burn out, are not productive, and in general don’t help create a healthy community of civic tech and open data enthusiasts.

That is not to say we feel others shouldn’t do them, they are very good opportunities to spark discussion and introduce new audiences to problems in the social sector. DataKind and RHOK and numerous others host hackathons or variations of them regularly to stir the pot, bring new people into civic tech and they can be successful starts to long term connections and experiments. A lot of people in the DataMeet community participate and enjoy hackathons.

However, with great data access comes great responsibility. We always want to make sure that even if no output is achieved when a dataset is opened at least no harm should be done.

Last October an open data hackathon, Urban Hack, run by Hacker Earth, NASSCOM, XEROX, IBM and World Resource Institute India wanted to bring out open data and spark innovation in the transport and crime space by making datasets from Bangalore Metropolitan Transport Corporation (BMTC) and the Bangalore City Police available to work with. A DataMeet member (Srinivas Kodali) was participating, he is a huge transport data enthusiast and wanted to take a look at what is being made available.

In the morning shortly after it started I received a call from him that there is a dataset that was made available that seems to be violating privacy and data security. We contacted the organizers and they took it down, later we realized it was quite a sensitive dataset and a few hundred people had already downloaded it. We were also distressed that they had not clarified ownership of data, license of data, and had linked to sources like Open Bangalore  without specifying licensing, which violated the license.

The organizers were quite noted and had been involved with hackathons before so it was a little distressing to see these mistakes being made. We were concerned that the government partners (who had not participated in these types of events before) were also being exposed to poor practices. As smart cities initiatives take over the Indian urban space, we began to realize that this is a mistake that shouldn’t happen again.

Along with Centre for Internet and Society and Random Hacks of Kindness we sent the organizers, Bangalore City Police and BMTC a letter about the breach in protocol. We wanted to make sure everyone was aware of the issues and that measures were taken to not repeat these mistakes.

You can see the letter here:

We are very proud of the DataMeet community and Srinivas for bringing this violation to the attention of the organizers. As people who participate in hackathons and other data events it is imperative that privacy and security are kept in mind at all times. In a space like India where a lot of these concepts are new to institutions, like the Government, it is essential that we are always using opportunities not only to showcase the power of open data but also good practices for protecting privacy and ensuring security.

Map of Electoral districts of Sri Lanka

SriLankan maps for Electoral districts are available for download now. I initially made this for a friend who wanted to analyze the election results. The Electoral districts are derived from the administrative maps.

via GIPHY

You can check the diff on github to see how the maps were changed.

GADM database of Global Administrative Areas is the source of administrative data. I used three simple online tools

  • GeoJSON.io for converting from KML to GeoJSON and adding attributes.
  • MapShaper for merging the areas
  • GitHub for storing the map files.

Note: I don’t provide any guarantee on the accuracy of the maps. So don’t use if you want accurate maps. I have made notes on how these maps were derived. Use it if you think the process is right. Raise an issue if you find anything.

Nepal Needs You to Make Maps!

Post by Tejas AP

The Humanitarian OpenStreetMap Team (HOTOSM) has activated to support crisis response in Nepal after the recent devastating earthquake. A global team of volunteers is contributing to the OSM project by mapping physical infrastructure (roads and buildings) as well as traces and areas safe for crisis responders to use and congregate at. We believe improved information, especially of the remote affected areas, is crucial to improve the efforts carried out by relief agencies on-ground.

Volunteers may contribute to the map of Nepal simply by selecting a task from the wiki. Basic questions about registering and using the OSM mapping tool can be found in its comprehensive documentation here.

While the volunteers have been recording road networks and buildings at a rapid pace, we understand that the communication networks in Nepal are still being restored, and crisis responders might not have access to navigation maps to expedite their efforts. We want to help in ensuring that people have access to map data in every manner possible.

We want to print offline maps and send them with relief materials from India to Nepal. Please help us by providing us

* a list of towns/villages/regions you need maps of, and

* point-of-contact we can deliver the printed maps to.

For more information, please get in touch with

Sajjad  – sajjad(at)mapbox(dot)com

Tejas  – tejaspande(at)live(dot)com

Nisha  – nisha(at)datameet(dot)com

Prabhas – prabhas.pokharel(at)gmail(dot)com

Meanwhile, here’s some information that you might find useful  –

1. https://www.mapbox.com/blog/nepal-earthquake/
2. The News Minute report –
http://www.thenewsminute.com/article/how-group-individuals-all-over-india-are-making-maps-help-rescue-teams-nepal
3. HOT Wiki – https://wiki.openstreetmap.org/wiki/2015_Nepal_earthquake
4. KLL report – http://kathmandulivinglabs.org/blog/

Tool Review: WebScraper

Usually when I have any scraping to do I ask Thej  if he can do it and then take a nap. However, Thej is on vacation so I was stuck either waiting for him to come back or I could try to do it myself. It was basic text, not much html, no images, and a few pages, so I went for it with some non coder tools.

I checked the School of Data scraping section for some tools and they have a nice little section on using browser based scraping tools. I did a chrome store search and came across WebScraper.

I glanced through the video sort of paying attention got the gist of it and started to play with the tool.  It took awhile for me to figure out.  I highly recommend very carefully going through the tutorials.  The videos take you through the process but are not very clear for complete newbies like me so it took a few views to understand the hierarchy concept and how to adapt their example to the site I was scraping.

I got the hang of doing one page and then figuring out how to tell it to go to another page, again I had to spend quite a bit of time rewatching the tutorial.

At the end of the day I got the data in neat columns in CSV without too much trouble.  I would recommend WebScraper for people who want to do some basic scraping.

It is as visual as you can get though the terminology is still very technical.   You have to do into the developer tools folder which can feel intimidating but ultimately satisfying in the end.

Though I’ll probably still call Thej.