DataMeet 6 was a 2 day, Data Science Hackathon that was organised by a BFSI company, Zone Startups and DataMeet Mumbai. The Hackathon took place in the Bombay Stock Exchange Building at Zone Startup’s office. Twelve teams participated. These included teams of young data enthusiasts and specialist data scientists teams from companies like TCS and Housing.com.
The BFSI company opened up 80GB of it’s real transactional data in a secure environment to the participating data enthusiasts.
The teams were expected to analyze the data and draw out insights that would be relevant to their use case scenarios such as Health Bankruptcy or pull out a trend which is hidden and unknown to the BFSI company. Teams were free to use any tool of their choice from R, Python, Tableau, etc.
Each team was provided an individual secure Oracle DB connection from which they could query the data but not download the data. The Oracle DB connections were opened only to the Static IPs of Zone Startups Office and the data to and fro from the servers was monitored to ensure against downloading of the data.
The day started with various teams analysing the raw data, tables, meaning of columns. The representatives from the BFSI company also gave a briefing about objectives.
Many of the young teams did not turn up on Day 2 due to complexity of the problem. At the end of Day 2, the judges from the BFSI company evaluated each team’s progress, gave feedback and suggestions.
Mumbai had it’s fourth data meet on December 6, 2014 with a total of 11 participants. Due to scheduling issues, the November meet-up was moved from last Saturday of the month to the first Saturday of December. This time the meet-up was held at Pykih’s office on 8th floor at Sardar Patel Institute Of Technology.
The speaker was Bhavin Dalal, Senior Technology Manager, from Hansa Cequity.
At Cequity, he plays multiple role not limiting to solution architect, consultant & project manager. While he has strong product framework knowledge , his expertise lies in data warehousing technologies.
Bhavin spoke on two main topics:
1. Data Cleaning – he explained what is Data Quality and which factors determine the quality of data. He briefed through the common Data quality problems faced while cleaning the data. He showed us an example where they faced problems while cleaning car data and how they solved it. He also explained data cleaning methods which will helped us to understand the approaches towards data cleaning, the importance to do data cleaning and some do’s and don’t while capturing data.
2. Visualising census data for better understanding India – here he gave us eye popping fact list revolving around the census data. This topic gave us the better understanding that there are plethora of data points which can be meaningfully used to come up with really good insights on Indian population.
The next data meet will be held on last Saturday of December 2014. Please follow the Mumbai Meet-Up Group to know about the details.
Written by Sanjit Oberai
Mumbai’s first Data Meet kicked off on 30th August at the Sardar Patel Institute of Technology with a total of 26 people attending the event. It started off with a round of introduction by all the attendees which was a mix comprising developers, journalists, students and data enthusiasts.
There was an introduction by Ritvvij, who is the founder of Pykih, where he spoke about how important it is to have a data group in Mumbai. ( Listen to the recording here)
The first talk was by Ajaj Kelkar (above) , who is the Cofounder http://hansacequity.com and he gave an introduction to how the recent movement of Open Data started about 5-7 years back in the US as there was a need to move data from the private space into public space, and this was possible by the the active push seen by transparency groups. This idea spread and many progressive countries realised that this can be a powerful movement which can be used for public good.
Data can be used to help take decisions on the social or personal platforms. He highlighted that there are barriers and we need to overcome them. He explained how many cities abroad have appointed Chief Data Officers whose jobs is to monitor data in each city i.e. municipality budgets, etc.
An important point also spoken about was on Privacy. As consumers we are leaving a lot of data out there on social media platforms. However, in the absence of proper laws, we need to be careful about what we put out there and this needs to an area where we need to think carefully about.
The second talk was by Ritvvij, founder of http://www.pykih.com who explained the importance of how one should visualise. He emphasised that a lot of people are not aware of how to use correct tools to visualise data and the most common mistake people make is with the humble PIE CHART. He further went onto explain the process of visualisation mistakes that many news organisations are making today and what they could do to improve their charts/graphs.
He also gave examples of his recent work with Firstpost.com where he made custom visualisations for them for the elections and the IPL. He also worked with narendramodi.in .
He also spoke about the issues most journalist face when dealing with government data and how difficult it was for them to have access to it. He ended by stating that there was a dire need for a tool that could become the CMS for data journalism there by allowing journalists to focus on the story rather than doing data janitorial work.
The third talk was by Sanjit Oberai, Deputy Editor of IndiaSpend, a non-profit that uses data to tell stories. He spoke about how there are tonnes of stories buried in government data and how to write articles around that. He spoke about how they research articles, what are the sources of data and how one can visualise data using free to use tools like Data Wrapper, Knoema, Tableau, etc.
He also spoke about a new initiative called Fact Check which can be used to bring about accountability and raise the common man’s awareness. He cited examples about the Goa MLAs who were going to Brazil and how that created quite a stir with the Congress calling this a wasteful expenditure. A quick factcheck was done to see the assets declared by them in the sworn affidavits provided by the candidates before the legislative elections in 2012.
He also spoke of a Data Room which would be a first its kind resource for students, journalists and researchers that will allow comparison of state wise data like population, health, education, etc.
The last talk was by Srinivas Kodali, an IIT-Madras graduate, who is researching with transport data of cities. He explained how one could scrap data from websites and demoed tools like Selenium for scraping from sites that generate data on the fly using AJAX. However, Selenium is a front-end tool typically used for testing and requires a browser session open. Hence, it cannot be used for large scale scrapping. Then he went on to show case PhantomJS that would allow scrapping as Selenium would but in a Headless fashion i.e. without the browser UI.
The next meeting with be held at the end of September. Details will be posted on the site soon.