User Tools

Site Tools


Working With Data



  1. Tabula - Extract data from PDFs
  2. Coherent PDF Command line tools - Powerful, free tools to manipulate PDF files
  3. PeePDF - Python PDF analysis tool.
  4. Accurately extract tables from PDFs.
  5. Tabula allows you to extract that PDF data into a CSV or Microsoft Excel spreadsheet using a simple, easy-to-use interface.


  1. TAGS is a free Google Sheet template which lets you setup and run automated collection of search results from Twitter.
  2. Twitter Capture and Analysis Toolset (DMI-TCAT) - The Digital Methods Initiative Twitter Capture and Analysis Toolset (DMI-TCAT) is a set of tools to retrieve and collect tweets from Twitter and to analyze them in various ways. It is written mostly in PHP and runs in a webserver (LAMP) environment

Working with Other Formats

  1. SVG Crowbar - Export inforgraphics in your browser to images.
  2. Tabletop.js - Interface for Google Spreadsheet.
  3. Miso Dataset - Client-side data transformation and management library
  4. tablesorter.js - Client-side table sorting based on jQuery.
  5. ScraperWiki can be used for scraping the data.
  6. Wrangler is an interactive tool for data cleaning and transformation. Spend less time formatting and more time analyzing your data.
  7. OpenHeatMap - Turn your spreadsheet into a map
  8. Plotly is a collaborative data analysis and graphing tool.
  9. googleVis Interface between R and the Google Chart Tool
  10. Deduplicate Tags- Replicates the tags in a tag cloud by their value
  11. Discus Comment Scraper -This tool scrapes threads and comments from websites implementing the Disqus commenting system
  12. netvizz - Extracts various datasets from Facebook.
  13. Table 2 Net - Extract a network from a table. Set a column for nodes and a column for edges. It deals with multiple items per cell.
  14. ScienceScape - Helpers for scientometrics. Convert files, get networks, visualize stuff from Scopus or Web of Knowledge.



tools/working_with_data.txt · Last modified: 2014/11/26 09:17 by thejeshgn