Name Description URL
Digital Research Tools (DiRT)

The DiRT Directory "aggregates information about digital research tools for scholarly use". DiRT is an evolving project originally founded by Lisa Spiro. The aim of DiRT is to make it easy for digital humanists and other scholars conducting digital research to source out the necessary tools for their project. DiRT facilitates access to a wide variety of digital scholar tools ranging from blogging platforms to linguistic research tools to annotation resources to data visualization. Managed by an editorial board of approximately 20 members, DiRT's directory is constantly expanding and evolving as the team works to ensure "the coverage and accuracy of the directory's tool listings".

http://dirtdirectory.org

TAPoR (Text Analysis Portal for Research)

The Text Analysis Portal for Research (TAPoR) is "both a resource for discovery and a community". "The TAPoR team has created a place for Humanities scholars, students and others interested in applying digital tools to their textual research to find the tools they need, contribute their experience and share new tools they have developed or used with others". TAPoR is a comprehensive database of textual analysis tools: programs that leverage computational methods to manipulate, visualize, edit, categorize, and search texts. The TAPoR community has been recently expanded to facilitate a more interactive environment. Users can now evaluate, review, and sort tools already available in the TAPoR database as well as create new tools to be included on the website.

http://tapor.ca

R

"R is a language and environment for statistical computing and graphics". R facilitates a wide variety of statistical and graphical techniques for data manipulation, calculation, and graphical display. R is characterized as an environment as opposed to being a tool because it is a "fully planned and coherent system, rather than an incremental accretion of very specific and inflexible tools". "One of R's strengths is the ease with which well-designed publication-quality plots can be produced, including mathematical symbols and formulae where needed. Great care has been taken over the defaults for the minor design choices in graphics, but the user retains full control". R is an open source program and is freely available.

http://www.r-project.org/

Digital Methods Initiative (DMI)

Founded in 2007 but in development since the later 1990s, DMI is a collaborative project out of the University of Amsterdam. DMI is a "contribution to doing research into the 'natively digital'". The goals of the project are twofold: firstly, the projects aims to interrogate virtual methods in order to evaluate the differences this new media makes and, secondly, the project aims to create a platform here display the tools and sources that can be used in digital research. DMI provides practical (like how-to's) and critical (outlooks and critiques) resources allied with specific tools to aid scholars in using and evaluating these web-based programs.

https://wiki.digitalmethods.net/Dmi/ToolDatabase

Mallet

Mallet, or a Machine Learning for Language Toolkit, "is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text". Mallet tools are optimized for five functions: importing data, classifying documents, sequence tagging, topic modelling, and algorithmic, numerical implementation. Mallet also offers an add-on package, GRMM, that expands the tools to contain support for general graphic modelling. Each of the Mallet categories functions as a toolkit: equipped with several different applications and resources that may be useful to scholars conducting the particular genre of research.

http://mallet.cs.umass.edu/

Stanford Topic Modeling Toolbox

Stanford Topic Modeling Toolbox (TMT) is a resource developed by The Stanford Natural Language Processing Group. TMT "brings topic modeling tools to social scientists and others who wish to perform analysis on datasets that have a substantial textual component". TMT features the ability to import and manipulate texts, train topics models to create textual summaries, and generate compatible "outputs for tracking word usage across topics, time, and other groupings of data". TMT was written in 2009-2010 and uses an old version of Scala. The program is no longer being updated and The Stanford Natural Language Processing Group is no longer providing support for the users but "some people still use it and find it a friendly piece of software for LDA and Labeled LDA models".

http://nlp.stanford.edu/software/tmt/tmt-0.4/

Gensim

Gensim began as a "a collection of various Python scripts for the Czech Digital Mathematics Library dml.cz in 2008, where it served to generate a short list of the most similar articles to a given article". Gensim was created to address the challenges of efficiency, scalability, and computation power in this library system. Gensim is "the most robust, efficient and hassle-free piece of software to realize unsupervised semantic modelling from plain text. It stands in contrast to brittle homework-assignment-implementations that do not scale on one hand, and robust java-esque projects that take forever just to run “hello world”". Gensim is a robust, open-source, platform independent software.

http://radimrehurek.com/gensim/index.html

Topic Modelling Tool

Topic Modelling Tool "is a simple GUI-based application for topic modeling that uses the popular MALLET toolkit for the back-end". Topic modelling is a "way to analyze large volumes of unlabeled text" by generating topics: "clusters of words that frequently occur together". Topic Modelling Tool uses contextual clues to connect words with similar meanings and differentiate words with multiple meanings. Using the Topic Modelling Tool in its basic mode, the user would input their data and then constrain it using a specified number of topics. Once the parameters have been set, the tool sorts through the data and generates a report on the given topics.

https://code.google.com/p/topic-modeling-tool/

Gephi

Gephi is an "interactive visualization and exploration platform for all kinds of networks and complex systems, dynamic and hierarchical graphs". Described as "Photoshop but for data", Gephi is a "tool for people that have to explore and understand graphs". The aim of Gephi is to "help data analysts to make hypothesis, intuitively discover patterns, isolate structure singularities or faults during data sourcing". Gephi was designed to assist data analysis, link analysis, social network analysis, and biographical network analysis. Some of Gephi's impressive features include: easy node grouping, a manipulative and multi-level layout, real-time user interaction, colour-coded partitioning, and generating of reports on centrality or other calculable characteristics of the network.

https://gephi.github.io/

Graphviz

Graphviz is an "open source graph visualization software" that facilitates the representation of information in the form of an abstract diagram, graph, or network. Graphviz converts "descriptions of graphs in a simple text language, and make diagrams in useful formats, such as images and SVG for web pages; PDF or Postscript for inclusion in other documents; or display in an interactive graph browser". Users are in complete control of the colour, font, layout, and shape of their visualization. The various shapes or layouts correspond to the type of data and the nature of the research questions being posed. Additionally, Graphviz offers many web-based and interactive interfaces, auxiliary tools, libraries and language bindings.

http://www.graphviz.org/