Name Description URL
Word and Phrase

Word and Phrase is a free, web-based text analysis tool created by Dr. Mark Davies of Brigham Young University. Users can paste texts directly into the box provided. The tool provides a range of detailed information on a text's words and phrases, on one screen with a single search. [Credit to TAPoR for this exceptional annotation]


Mandala is a "rich-prospect browsing interface that allows users to explore a data set using multiple criteria". Mandala is compatible with .txt, .rtf, and .pdf files but functions optimally with .csv or .xml files where the documents are formatted in searchable columns and files. The Mandala browser is formatted to read files and provide data visualizations of user-set criteria. Once the text has been imported, live Mandala user interface allows for the data to be manipulated and collated in order to fulfill the necessary research requirements of the user. The data is visually sorted on a circular middle palette and the user can reconfigure the data in the live browser. The data can be exported in text or image files.

Collocation (TAPor)

"TAPoRware is a set of text analysis tools that enables users to perform text analysis on HTML, XML and plain text files, using documents from the users' machine or on the web". Collocation is one of the tools housed under the TAPoR umbrella. Collocation allows users to search a specific document for words that occur together. TAPoR scans the uploaded document or URL for the specific words or patterns identified by the user under the confines also determined by the user. It then produces a report that calculates the number of instances the specified words appear together in the selected document. This collocation extraction is commonly associated with key-word-in-context searches.


Scrapy is "an open source and collaborative framework for extracting the data you need from websites. In a fast, simple, yet extensible way". Scrapy is a site crawling application that is structured to retrieve structured and useful data from websites for the purposes of data mining, information processing or historical archiving. Scrapy facilitates the data extraction from nearly any website by allowing user to write their own Spiders - directions for locating and retrieving website data. Scrapy is fast, powerful, easily extensible, and yet entirely simple. Scrapy is a free, multi-platform program that is compatible with Windows, Mac, and Linux.

CATMA: Computer Aided Textual Markup & Analysis

CATMA is a "practical and intuitive tool for literary scholars, students and other parties with an interest in text analysis and literary research". CATMA facilitates efficient literary analysis by "helping perform many of the procedures [...] that normally have to be carried out entirely manually". CATMA's key features include advanced search in the text, visualization of the distribution items of interest, the possibility of analysis a whole corpora of texts in one step, easy toggling between modules, and freely producible Tagsets. "CATMA integrates three functional, interactive modules: the Tagger, the Analyzer and the Visualizer": the Tagger implements of graphic interface for textual mark-up, the Analyzer has Query Builder that executes complex and powerful data inquires, and the Visualizer offers a wide range of charting possibilities that cater to the user's needs and preferences.


GeoNames is massive geographical database that contains "over 10 million geographical names and consists of over 9 million unique features". GeoNames integrates geographical data with place names in various languages, physical features (area, elevation, longitude/latitude), and social statistics (population, currency, postal codes, national flag). GeoNames is a collaborative project that encourages user participation by allowing users to "manually edit, correct and add new names using a user friendly wiki interface". GeoNames is a worldwide initiative that is managed and maintained by ambassadors around the globe who lend their help and expertise to the project's development.

Text Analysis For Me Too (TAToo)

Brat Rapid Annotation Tool

"Brat is a web-based tool for text annotation". Brat is particularly designed for structured annotation where the textual notes are fixed and can be easily categorized in order to aid automated computer processing and interpretation. Brat facilitates four types of fixed annotations: text span annotations, suitable for creating categorical annotations for entities; relation annotations, suitable for drawing simple relationships between entities; n-ary associations, that link annotation to specific roles; and finally normalization annotations, that associate internal annotations with external resources. All annotations can be further explained and categorized using attributes that describe the base annotation - similar to how adjectives modify a noun. Brat's user-friendly and intuitive features include comprehensive visualization, editing, integration with resources, annotating in any language, and easy export in multiple formats.


Lexomics is a text mining software that leverages computational techniques and statistical analysis to answer literary questions. Lexomics searches through texts for word patterns and determines how different parts of a work relate to one another. The web-based Lexomics tools "enables you to "scrub" (clean) your unicode text(s), cut a text(s) into various size chunks, manage chunks and chunk sets, tokenize with character- or word- Ngrams or TF-IDF weighting, and choose from a suite of analysis tools for investigating those texts". The program uses dendrograms - a visual representation of word frequency in text - to analyze to relationship between a text and its author, source, other similar texts. Other visual representations include word clouds and bible visualizations that also represent word frequencies and ratios in a text or set of texts.


"Netlytic is a cloud-based text and social networks analyzer that can automatically summarize large volumes of text and discover social networks from online conversations on social media sites". Netlytic is designed to assist researchers to "understand an online group’s operation, identify key and influential constituents, and discover how information and other resources flow in a network". Netlytic facilitates the importation of online conversation data, the exploration and identification of emerging themes within the data, and the automatic visualization of chain networks or person name networks. Netlytic facilitates the interrogation of a myriad of social network features: measuring the community's strength, identifying prominent actors versus peripheral participants, analyzing group perceptions, and "sharing information within a network of trust". The application is best suited for analyzing large, online group communities.