|
The LINGUIST web server will be shut down from 11pm - 1am GMT (6pm - 8pm EST) for server maintenance.
All web services will be unavailable during this time. We apologize for any inconvience.
|
Software
|
|
Page Index:
Software: Computer Aided Translation
Software: Diagram Display
Software: Fieldwork
Software: Historical Reconstruction
Software: Lexicons
Software: Morphological Analysis
Software: Natural Language Processing
Software: Other Software Tools
Software: Parsers
Software: Phonetic Analysis
Software: Speech Recognition and Synthesis
Software: Taggers
Software: Transcription
Software: Concordances
Software: Software Directories
Speech Analysis (including Clinical Speech Analysis)
|
|
|
Software: Computer Aided Translation
|
- Caesar Machine:
Latin hypertext reader for De Bello Gallico, I - freeware for DOS and Windows.
- Computer Assisted Translation:
STAR Transit XV is a CAT tool that uses a translation memory to speed up and improve the quality of the translation process. Other products of the STAR family are TermStar and WebTerm. TermStar is an extremely scalable solution that can be used on a single user desktop, on an enterprise database server such as Oracle, Sybase or MS-SQL, or on a Web Server. WebTerm allows to create and update enterprise-related terminology over the corporate intranet or the Web using a Web browser.
- EXTRAKT:
Linguistic engine for morphological analysis (lemmatization) , generation, translation (of terms for a cross-lingual search), identification of language. Most European languages are covered.
- English Spanish Dictionary:
English Spanish Dictionary for translations, synonyms, antonyms, verb conjugations, thesaurus and idioms builder.
- English Spanish Translator:
English to Spanish and Spanish to English text translation software. Context-sensitive translator. Open text files, word documents and translate them automatically.
- Enhanced Ottoman Turkish Keyboard:
Enhanced Ottoman Turkish Keyboard can translate between Latin and Arabic via Internet Explorer. It can process the transcription and adjoin. You can use the program to transfer the text to word processors such as Word for further editing. Additionally, you can copy and paste Latin text into the text field of the program to get instant translation.
- Eyespeak - The World Leader in English Pronunciation:
Do you or your students have difficulty in being understood? EyeSpeak is the software and publisher of the patented speech recognition technology that provides you with direct feedback in the form of visual, audio and written data so you can understand the relevant instructions to improve your English. An international product with Chinese, Japanese, Korean, French and Spanish instructions. The content can be on CD-ROM or as a web download and provides you with unlimited training hours, ranging from Beginner to Business and our innovative technology is focused on you.
- Kataku - Machine Translation:
Kataku is the world's first commercial machine translation (MT) product for Indonesian - English pair developed by ToggleText Pty. Ltd. .
It provides a free online machine translation (MT) on http://www.toggletext.com.
Note: The free trial interface will only translate the first 300 words of text or web content.
- Keyboard of Modern Turkic Languages:
Keyboard of Modern Turkic Languages can translate between Latin and Cyrillic or vice versa, any text (via a browser such as the Internet Explorer. You can use the program to transfer the text to word processors such as Word for further editing. Additionally, you can copy and paste Latin or Cyrillic text into the text field of the program to get instant translation.
- KudoZ:
Not really a dictionary; a very structured forum--with a point system--creates a "human dictionary." Over 8000 professional translators are now participating.
- Lingua::Translit:
is a tool that converts text between various writing systems. Wherever possible the transliteration is based on national or international standards (e.g. ISO 9, DIN 31634). Otherwise common national transliteration rules are applied.
Lingua::Translit is provided as an online service as well as an open source Perl library. The module provides a simple to use object-oriented API and can easily be extended by writing intuitive character mappings in a predefined XML language.
- NTT Machine Translation Group Resources:
Japanese-English linguistic term list.
- Natural Language Processing software:
Natural Language (text) Processing software for parsing, spell-checking, machine translation, thesauri, question answering and text attribution for English, German, French, Italian.
- Principal Investigator:
BabelCode enables the human writer to directly author machine-translatable content and guarantees such content be converted to correct, natural and multilingual translation versions, automatically.
- QuickCount 1.0:
Quickcount is a word and line counting for freelance translators, translation and localization agencies, transcription agencies, writers, project managers and other professionals who base their quotations and invoices on document text count (word count, line
count, gross line count, character count, page count.). QuickCount provides easy to use client and invoice module allowing to export invoices to PDF and Word documents.
- Similis, the Second Generation Translation Memory:
Similis is a full-featured computer-aided translation tool designed for project managers and translators faced with growing demands for both productivity and quality.
Similis analyzes previous translations, generates a translation memory (TM) and applies it to all new projects in order to deliver optimal results in two ways:
* Translators save time when translating recurrent segments, terms, and word groups.
* Translations are more consistent across different documents.
Similis is a second-generation translation memory (TM).
Much more powerful than first-generation TMs, it includes a linguistic analysis engine, uses chunk technology to break down segments into intelligent terminological groups, and automatically generates specific glossaries.
Available in both server and standalone versions, Similis™ meets the needs of large corporations and institutions wanting to better manage both their in-house and outsourced translation projects, as well as those of translation professionals seeking customer loyalty.
- Web Demo of Anaphora Resolution Program for Bulgarian - LINGUA:
This Web-page demonstrates the work of pronominal anaphora resolution software for Bulgarian.
- xlit:
xlit is a program for transliterating text. It allows the user to define a transliteration simply by typing the input strings in one window and the strings to which they are to be mapped in another. It understands Unicode and provides a number of character entry tools. xlit also provides some advanced facilities not found in typical transliteration programs. It is often necessary to restrict transliteration to particular parts of the text. xlit understands a variety of delimiters and if so instructed will transliterate only the regions enclosed by the specified delimiters or only their complements.
|
Software: Diagram Display
|
- Augmented Syntax Diagram (ASD) Editor and Parser:
Augmented Syntax Diagrams (ASDs)
represent grammars as networks of nodes
and links. They are equivalent to, but simpler
than, ATN grammars. This site contains a
description of ASDs, free software written in
Java for editing and parsing with ASDs, and
example grammars, with semantic
augmentations, for parts of English.
- Bracket Notation to Tree Converter:
This is a small web application which will convert your labeled bracket notation into a syntax tree. Use of the application is free. You may save the generated images (.png files) to your hard drive for use in other programs. The application is not limited to use for english but the page is in english.
- LaTeX for Linguists:
A guide to LaTeX for linguists with information on how to generate attribute-value matrices (AVM), bibliographies, numbered examples, OT tableaux, phonetic symbols, logical symbols and formulae for semantics, and trees using LaTeX.
- Linguistic Tree Constructor:
LTC is a free tool for drawing linguistic syntactic trees, running on Win32 platforms.
- TiGer Search:
Tools for linguistic text exploration; also for Mac OS X
- TreeForm Syntax Tree Drawing Software:
TreeForm Syntax tree drawing software is an open source Linguistic Syntax and Semantics tree drawing editor. Designed for WYSIWYG n-ary tree drawing,
reorganizing, saving and printing, this tool greatly speeds up the process of producing Syntax trees. TreeForm also lets you make .pdf (with Acrobat
professional or MAC), .jpg and .png trees. This Java program works on MAC, Windows and Linux machines.
- Trees 2:
Trees 2 is a Macintosh program for displaying and manipulating syntactic trees and derivations.
* There is now an update of the program,
Trees 3, which runs on Windows.*
- Txtkit:
A visual text mining tool for Mac OS X
|
Software: Fieldwork
|
- Alchemist:
The original purpose of Alchemist is to allow you to read in raw text files and create morphological gold-standards in XML format. Using Alchemist, you can identify morphemes, along with a number of important characteristics of the morphemes, such as whether they are roots or affixes, the degree of analyst certainty, and allomorphs of the morpheme.
Alchemist is also a good general tool for sorting and filtering lists of words, because it allows the user to easily use regular expressions applied to words.
- Audiamus:
Audimaus builds a corpus of linked text and media. It is a cross-platform tool that allows presentation of textual material linked to unsegmented media files, using quicktime to instantiate links. It was developed as a means of working interactively with field recordings and of presenting texts and example sentences as playable
media with a dissertation.
- EXMARaLDA:
A system and toolset for creating, managing and analysing corpora of transcriptions of spoken language. Consists of an editor for transcriptions in musical score notation, a corpus manager and a search tool.
All file formats are XML based which maximizes exchangeability and archiveability. Many import and export functionalities (Praat, ELAN, AGTK, RTF, HTML, SVG etc.).
- Kura:
Kura is a complete system for the handling of linguistic data, especially fieldwork data from small-corpus languages. It allows users to enter texts in any language, analyze those texts and bring the analyzed linguistic facts into relation with each other. Kura includes both a desktop application for easy handling of interlinear texts, lexica and other linguistic data, and a special-purpose webserver for the online presentation of the analyzed data.
- Research Assistant:
Sanchay is a platform for working on languages (especially South Asian) using computers. It is still in the development stage, but components like a text editor with customizable support for languages and encodings, annotation interfaces, etc. are ready.
- Toolbox:
Toolbox is a data management and analysis tool for field linguists. It is especially useful for maintaining lexical data, and for parsing and interlinearizing text, but it can be used to manage virtually any kind of data.
|
Software: Historical Reconstruction
|
- ALingua:
A Java application that simulates the evolution of a two-language system in a finite population. In particular, ALingua allows one to examine the spatial dynamics of such a system given a set of initial conditions: a distribution of agents, a network defining connections between them, and a language learning algorithm with associated parameter settings.
- ETYMO:
ETYMO - a linguistical formula interpreter to simulate diacronic language evolution (examples for Latin->Spanish)
- Phono: Version 4.1:
Phono is a software tool for developing and testing models of regular
historical sound change. If you wish to test a sound-change model for which you have an ordered set of rules and a set of ancestor words, or if you teach about the operation of regular sound change, Phono may be useful to you.
- Wordcorr:
A tool to assist the linguist in comparative phonology. Data entered by keyboard (full IPA) or imported. The linguist decides what forms are comparable, annotates them as such and aligns their segments, then tabulates the resulting correspondence sets into a results structure organized by presumed protosegment and environment. The entire results structure can be reorganized as needed to express an analysis.
|
Software: Lexicons
|
- Alchemist:
The original purpose of Alchemist is to allow you to read in raw text files and create morphological gold-standards in XML format. Using Alchemist, you can identify morphemes, along with a number of important characteristics of the morphemes, such as whether they are roots or affixes, the degree of analyst certainty, and allomorphs of the morpheme.
Alchemist is also a good general tool for sorting and filtering lists of words, because it allows the user to easily use regular expressions applied to words.
- An English Dictionary and Thesaurus in Flash:
A comprehensive lexical reference system with more than 145,000 related terms and 110,000 meanings. A lookup is followed by a trail of related terms. This software supports more synonyms, hypernyms, hyponyms, antonyms, related verbs and more. Written in Flash, it makes a journey through the English language a rich multimedia experience.
- CLaRK - an XML-based System for Corpora Development:
CLaRK is an XML-based software system for corpora development. The main aim behind the design of the system is the minimization of human intervention during the creation of language resources.
- Classics Technology Center:
Etymological Dictionary of Greek & Latin Roots of English words.
- Electronic dictionaries for Windows Polyglossum v. 3.52:
More than 200 dictionaries data bases for Polyglossum dictionary programm: common lexic dictionaries and sector-specialized dictionaries.
Several language pairs. We also started to issue multilanguage and illustrated dictionaries.
Dictionaries on business and economics, polytechnic, special technical dictionaries, dictionaries on medicine, biology, on mathematics, computing engineering etc., etc.
Dictionaries for professional interpreters and dictionaries for education.
English-Russian-English dictionaries
Deutsch-Russisch-Deutsch
Français-Russe-Français
Suomalais-Venäläinen-Suomalainen
Español-Ruso-Español
Swedish-Russian-Swedish
Explanatory dictionaries of Russian language: Dictionary by Vladimir Dahl (in old russian orfography, Dictionary edited by D.Ushakov, Proverbs of russian folk. Famous quotations (in Russian) and more ...
- IBM LanguageWare:
LanguageWare is a software component that provides linguistic processing for a variety of products and solutions in more than 20 languages. It comprises a Java library with a set of language resources. The library encodes the language models, and the resources (dictionaries) encode the lexical entries for each language and contain language-specific processing logic, such as logic for handling decomposition, spelling correction, morphology, hyphenation, language identification, etc.
- LIWC - Linguistic Inquiry and Word Count:
LIWC calculates the percentage of words within each file along 72+ dimensions. Categories include negative emotions (including anger, anxiety, sadness), positive emotions, cognitive processing, standard linguistic dimensions (pronouns, prepositions, articles), and common content categories (death, sex, occupation, etc). This is a sound program from a psychometric perspective -- both in the creation of categories and the validation of the dictionaries. Dictionaries in English, Spanish, German, Dutch, Italian, and Norwegian are available; partial dictionaries in Korean, Hungarian, and French,
- Lese- / Rechtschreibprogramme:
Umfangreiches Archiv mit Freewareprogrammen zum Lese- und Rechtschreibtraining.
Wortlisten zu den 500 häufigsten Stamm-Morphemen...
- Lexique:
Lexique 3 available at www.lexique.org is an open-source database for French. Including Lexique 2 and 3, it describes 55 000 lexical roots, and more than 135 000 lexical entries.
- Lexique Pro:
Lexique Pro is an interactive lexicon viewer and editor, with hyperlinks between entries, category views, dictionary reversal, search, and export tools. It's designed to display your data in a user-friendly format so you can distribute it to others.
- Matapuna Dictionary Writing System:
The Matapuna Dictionary Writing System is a Free, easy to use, web-based, multiuser, multilingual lexicography software system. It assists with many tasks, including dictionary creation and editing, data management, team collaboration, error checking, corpus, publishing, and progress monitoring.
- MorDebe:
MorDebe, is a free, large-scale, lexicographically controlled lexicon for European Portuguese, concentrated around inflectional morphology. The lexicon provides inflectional paradigms, word-class, and orthographic information for over 125.000 Portuguese words - with a total of around 1,5 million word-forms. On top of this, the database provides information about derivational morphology and orthographic variation for a large amount of lexical items. The database also contains words of other national variants of Portuguese (Brazil, Angola, Cabo Verde, etc.) - all words belonging to these variants are explicitly marked as such.
- MorphoLogic:
Morphologic is the CEE specialist of language technology; includes: proofing tools, like spell checkers, hyphenators, thesauri for many languages, as well as intelligent bi-lingual dictionaries and multi-dictionary systems with translation support features.
- Online English Spanish Dictionary:
Free online English Spanish Dictionary with translations, synonyms, definitions and usage examples.
- TAMS:
Text Analysis Markup System; for Linux and Mac OS X
- Texai Lexicon:
The Texai lexicon is a merging of WordNet 2.1, the CMU Pronouncing Dictionary, Wiktionary, and the OpenCyc lexicon. The format is RDF, N3 or TriG. Included are entries for lemmas, word forms, word senses, sample phrases and ARPABET pronunciations. A documentation file is available as a separate download.Only the TriG version contains context.
- TshwaneLex Lexicography Software:
TshwaneLex is a professional software application for the compilation of monolingual, bilingual or semi-bilingual dictionaries. TshwaneLex contains various innovative features designed to optimise the process of producing dictionaries, and to improve consistency and quality of the final dictionary product.
TshwaneLex supports Unicode throughout, allowing it to handle virtually all of the world's languages, and includes features such as immediate article preview, customisable fields, automatic cross-reference tracking, automated lemma reversal, online and electronic dictionary modules, export to MS Word format, and teamwork (network) support.
- WordNet:
A WordNet client for Mac OS X
- WordSmith Tools:
A suite of pc software for lexical analysis of corpora in a very wide variety of languages. Offers oncordancing, wordlisting, key words analysis and a number of other utilities. WordSmith 3.0 (OUP, 1999) handles Windows 3.1 and better and is restricted to Ascii/Ansi text; WS 4.0 (2002) requires Windows 98B or better and handles Unicode as well as Ascii/Ansi text.
Version 4.0 was issued in 2004. This is a complete new edition with many limitations removed and numerous additional features, such as sound concordancing, use of Unicode, tools for obtaining text from the Internet, etc.
- coolvocab:
Helps you to memorize foreign words, using your own word lists. For students or travelers. Ergonomic (few clicks).
- jMRC - MRC Psycholinguistic Database Java Interface:
jMRC is a Java interface for querying the MRC Psycholinguistic Database, which allows you to get psycholinguistic information about more than 150,000 words over 14 linguistic features. Features include concreteness, familiarity, frequency of use, age of acquisition, number of phonemes, etc.
- msort:
msort is a sophisticated sort utility. It differs from typical sort utilities in providing greater flexibility in parsing the input into records and identifying key fields and greater control over the sort order. Records need not be single lines of text but may be delimited in a number of ways. Key fields may be selected by position in the record , by character ranges, or by matching a regular expression to a tag. For each key an arbitrary sort order may be specified together with multigraphs, exclusions, and regular expression substitutions. In addition to the usual lexicographic and numerical orderings, msort supports sorting by date, time, and string length. Lexicographic keys may be reversed, allowing the construction of reverse dictionaries. Any or all keys may be optional. For optional keys, the user may specify how records missing the key field should compare to records in which the key field is present. Msort fully supports Unicode.
- theConcept:
An indexing and text searching tool for Mac OS X
|
Software: Morphological Analysis
|
- AGTK:
An annotation graph toolkit. Also available for Mac OS X.
- Alchemist:
The original purpose of Alchemist is to allow you to read in raw text files and create morphological gold-standards in XML format. Using Alchemist, you can identify morphemes, along with a number of important characteristics of the morphemes, such as whether they are roots or affixes, the degree of analyst certainty, and allomorphs of the morpheme.
Alchemist is also a good general tool for sorting and filtering lists of words, because it allows the user to easily use regular expressions applied to words.
- EXTRAKT:
Linguistic engine for morphological analysis (lemmatization) , generation, translation (of terms for a cross-lingual search), identification of language. Most European languages are covered.
- Emdros text database engine for analyzed or annotated text:
Emdros is an Open Source text database engine specializing in linguistic analyses of text. Emdros comes with a powerful query language for asking linguistically relevant questions of the data.
- Helpful add-in to MS Word : repetition counter and approximate matching search tools:
Fore Words is plugin (Add-in) to Microsoft Word , providing some helpful tools for text analysis. Currently add-in contains two items : Repetyler and K-Diff Search.
Repetyler calculates numbers of all repetitions in text (words or phrases). This program can help to improve (or just examine) the writing style in business documentation, literature text, correspondence, etc. Excessively frequent constructions and so called words-parasites can be invisible at a first glance, but drastically affect reader's impression in a wrong way. On the other hand, repetition analysis can help you to build the true portrait of person or find implicit messages in formal language. Web-masters can find Repetyler useful when analyzing words density and choosing keywords for search engines.
Professional version , Fore Words Pro , provides the additional possibility to count repetitions of word parts . This way you can find repeatedly used words in all their forms (particularly, with different suffixes). The length of word part being searched is configurable. Yet another configuration parameter is position of word part : this can be beginning of the word (prefix) or middle part.
K-Diff Search is search by approximate matching.
- IBM LanguageWare:
LanguageWare is a software component that provides linguistic processing for a variety of products and solutions in more than 20 languages. It comprises a Java library with a set of language resources. The library encodes the language models, and the resources (dictionaries) encode the lexical entries for each language and contain language-specific processing logic, such as logic for handling decomposition, spelling correction, morphology, hyphenation, language identification, etc.
- Lese- / Rechtschreibprogramme:
Umfangreiches Archiv mit Freewareprogrammen zum Lese- und Rechtschreibtraining.
Wortlisten zu den 500 häufigsten Stamm-Morphemen...
- Linguistica:
Linguistica is an ongoing research project developing software for the unsupervised learning of natural language morphology. It takes an untagged text corpus as its input, and attempts to determine the stems, affixes, and morphological structure of the words with no prior knowledge of the language.
- Morfix-Meister:
Ein Werkzeug zum Erkennen von Wortstrukturen durch das Hantieren mit häufigen Wortbausteinen gruppiert nach Rechtschreibmustern.
A German dictionary-like tool, sorted by morphemes.
- Research Assistant:
Sanchay is a platform for working on languages (especially South Asian) using computers. It is still in the development stage, but components like a text editor with customizable support for languages and encodings, annotation interfaces, etc. are ready.
- TAMS:
Text Analysis Markup System; for Linux and Mac OS X
- TiGer Search:
Tools for linguistic text exploration; also for Mac OS X
- XLE:
The Xerox Linguistics Environment is a tool for parsing and generating Lexical Functional Grammars. The software runs on Linux, Unix, Solaris and Mac OS X.
- minpair:
Generates a complete list of minimal pairs from a wordlist. Minpair accepts input in Unicode and optionally finds pairs differing in a single transposition or insertion/deletion. Multigraphs (sequences of characters treated as a single segment) may be defined. The basic program is a command-line program. It may be driven by an optional GUI.
|
Software: Natural Language Processing
|
- AGFL Grammar Work Lab:
A collection of software systems for Natural Language Processing, based on the AGFL-formalism (Affix Grammars over Finite Lattices).
- AGTK:
An annotation graph toolkit. Also available for Mac OS X.
- Alembic Workbench (AWB) annotation environment:
AWB is an annotation environment developed by the MITRE Corporation. It is available by way of a no cost public license.
- CLaRK - an XML-based System for Corpora Development:
CLaRK is an XML-based software system for corpora development. The main aim behind the design of the system is the minimization of human intervention during the creation of language resources.
- Fink Text packages for Mac OS X:
Text-related Unix packages for Mac OS X. Another source for native Mac OS X software is osx.hyperjeff.net/Apps.
- GATE:
General Architecture for Text Engineering. A domain-specific software architecure and development environment that supports researchers in Natural Language Processing and Computational Linguistics and developers who are producing and delivering Language Engineering systems.
- Inputlog:
Inputlog is a freeware research tool that enables researchers to log writing processes (Windows) and analyse them.
- Inputlog records the data of a writing session in Microsoft® Word;
- Inputlog generates datafiles for statistical, text, pause, mode and revision analyses;
- Inputlog plays the recorded session at different speeds.
- Intellexer - Custom Built Search Engines, Knowledge Management Tools, Natural Language Processing:
Linguistic platform Intellexer allows developing Custom Built Search Engines, Knowledge Management Tools, Natural Language Processing systems and other intelligent software.
- JavaRAP:
JavaRAP is a standalone, publicly-available implementation of the Resolution of Anaphora Procedure (RAP) given by Lappin and Leass (1994). The RAP algorithm resolves third person pronouns, lexical anaphors, and identifies pleonastic pronouns. The implementation uses the standard, publicly available Charniak (2000) parser as input, and generates a list of anaphora-antecedent pairs as output. Alternately, an in-place substitution of the anaphors with their antecedents can be produced. It could be used as a reference to benchmark other anaphora resolution algorithms or systems; or to provide anaphora resolution function as needed by other NLP applications.
- KorNet 1.5:
Semantic (associative) network programming system (described in Russian).
- Mind-1.1:
An original, linguistic theory of mind implemented
in JavaScript for tutorial purposes and in Forth for robots.
Clicking on the Mind-1.1 link causes the artificial Mind to
travel across the Internet and come alive in your Microsoft
Internet Explorer Web browser. Options include a default
tutorial mode, a printed-transcript mode for recording
natural-language-generation (NLG) sessions, and a troubleshoot
mode for debugging any malfunction of the still evolving software.
The documentation of the thirty-four (34) AI Mind-modules has
been published in November of 2002 as the 34 chapters of the
artificial intelligence textbook for computer science students,
- Natural Language Processing software:
Natural Language (text) Processing software for parsing, spell-checking, machine translation, thesauri, question answering and text attribution for English, German, French, Italian.
- Natural Language Software Registry:
A concise summary of the capabilities and sources of a large amount of natural language processing (NLP) software available to the NLP community.
- Natural Language Toolkit:
NLTK, the Natural Language Toolkit, is a suite of program modules, data sets and tutorials supporting research and teaching in computational linguistics and natural language processing. NLTK is ideally suited to students who are learning NLP (natural language processing) or conducting research in NLP or closely related areas, including empirical linguistics, cognitive science, artificial intelligence, information retrieval, and machine learning. NLTK has been used successfully as a teaching tool, as an individual study tool, and as a platform for prototyping and building research systems. NLTK is free software, written in Python, and released under an open source license.
- Pertinence Summarizer:
PS is a multilingual and multidomain text summarization software, which can summarize a wide variety of file formats to a length specified by the user.
Languages supported: French, English, Spanish, German, Portuguese, Italian, Japanese, Chinese and Korean
http://www.pertinence.net/index_en.htm
- SFST Tools:
The Stuttgart Finite State Transducer (SFST) tools are an efficient and easy-to-use platform for the implementation of morphological analysers and other applications which are based on finite-state technology. The implementation of the SFST tools is based on a C++ library. The SFST tools are distributed under the GNU Public License.
- Saransk:
Interactive natural (Russian) language software (NLP) system to talk to IBM 360/370 OS utilities (described in Russian).
- TiGer Search:
Tools for linguistic text exploration; also for Mac OS X
- VisualText:
Integrated development environment for NLP. Builds multi-pass, multi-strategy analyzers. The NLP++ programming language and Conceptual Grammar
hierarchical knowledge base support grammars, patterns, lexicons,
ontologies, and heuristics. Academic licensing available.
- What's Wrong With My NLP?:
A visualizer and graphical diff for NLP problems. Displays syntactic and semantic trees and
- XLE:
The Xerox Linguistics Environment is a tool for parsing and generating Lexical Functional Grammars. The software runs on Linux, Unix, Solaris and Mac OS X.
- openNLP:
The Open Natural Language Processing website with many software packages that also run on Mac OS X.
- theConcept:
An indexing and text searching tool for Mac OS X
|
Software: Other Software Tools
|
- Amalgam:
Java applet for online testing of Malaga grammars.
- Analysis:
A program which allows several types of text analysis. For Windows or Unix.
- Bibliographix:
A reference manager. Available in free (basic) and pro (advanced) versions. The latter adds the option of importing references from diverse OPACs including Library of Congress, GBV and some German libraries (it's a German software).
It also features a number of export formats (BibTeX, Reference Manager, Endnote) and has a direct module for setting references in Word.
- Bigram Statistics Package:
This is an easy to use suite of Perl tools for counting and analyzing bigrams in text.
- Bitstream:
Fonts,international language typefaces,font CDs,and custom type design services.
- CINTIL Concordancer and Corpus:
CINTIL Online Concordancer is now available at: http://cintil.ul.pt
This is an online concordancing service that supports the research usage of the CINTIL Corpus.
CINTIL-Corpus Internacional do Português is a linguistically interpreted corpus of Portuguese, developed at the University of Lisbon. At present it is composed of 1 Million annotated tokens,
manually verified by linguistic experts.
The annotation comprises information on part-of-speech, on lemma and inflection of tokens from open classes, on multi-word expressions pertaining to the class of adverbs and to the closed POS classes, and on multi-word proper names (for named entity recognition).
Feedback is very welcome, to cintil@ di.fc.ul.pt
______________________________//_______________________________________
O Concordanciador CINTIL (Corpus Internacional do Português) é um serviço online gratuito de extracção de concordâncias para a pesquisa linguística, que já se encontra disponível em http://cintil.ul.pt/pt/.
O Corpus CINTIL é um corpus do português que contém cerca de 1 milhão de palavras anotadas com informação linguística (classe morfo-sintáctica, lema e flexão das classes abertas, locuções pertencentes à classe dos advérbios e às classes fechadas e nomes próprios multi-palavra (para o reconhecimento de entidades nomeadas)), manualmente verificada por especialistas.
Este corpus está a ser desenvolvido e mantido na Universidade de Lisboa pelo grupo REPORT do Centro de Linguística da Universidade de Lisboa em cooperação com o Grupo NLX-Natural Language and Speech do Departamento de Informática da Faculdade de Ciências da Universidade de Lisboa.
Quaisquer comentários resultantes da experiência de utilização são muito bem-vindos e podem ser enviados para cintil@di.fc.ul.pt.
- Constructor - Portuguese Corpus Linguistics:
Online interactive and intuitive Javascript tool for building queries to search Portuguese corpora at the WWW
- E-meld School of Best Practices Tool Room:
The Tool Room provides information about hardware and software tools available for linguists, many of which will help you to conform to Best Practice. Tools are divided into the categories of Software and Hardware.
The software area houses a database of software recommended by linguists. In this section you can browse for software based on its function (Concordancers, Lexicon Management, etc), read the comments of other linguists, and share your own opinions.
The hardware area contains a database of information on many types of hardware that is useful in the field, from digital recorders to solar panels.
- Ergane:
A freeware multilingual dictionary programme (Win 3.1) using Esperanto as auxiliary language. Vocabularies for more than 40 languages available.
- Ethnologue - Languages of the world:
A catalogue of more than 6,700 languages spoken in 228 countries.
- Fluid Construction Grammar:
A fully operational grammar formalism and implementation for representing, learning and applying lexical and grammatical inventories. FCG opens many new research directions for linguists, especially those interested in cognitive, computational and evolutionary linguistics, and researchers in Artificial Intelligence.
- Gokturkish Keyboard:
Gokturkish Keyboard can transliterate between Latin and Gokturkish via Internet Explorer. You can use the program to transfer the text to word processors such as Word for further editing. Additionally, you can copy and paste Latin text into the text field of the program to get instant transliteration.
- Helpful add-in to MS Word : repetition counter and approximate matching search tools:
Fore Words is plugin (Add-in) to Microsoft Word , providing some helpful tools for text analysis. Currently add-in contains two items : Repetyler and K-Diff Search.
Repetyler calculates numbers of all repetitions in text (words or phrases). This program can help to improve (or just examine) the writing style in business documentation, literature text, correspondence, etc. Excessively frequent constructions and so called words-parasites can be invisible at a first glance, but drastically affect reader's impression in a wrong way. On the other hand, repetition analysis can help you to build the true portrait of person or find implicit messages in formal language. Web-masters can find Repetyler useful when analyzing words density and choosing keywords for search engines.
Professional version , Fore Words Pro , provides the additional possibility to count repetitions of word parts . This way you can find repeatedly used words in all their forms (particularly, with different suffixes). The length of word part being searched is configurable. Yet another configuration parameter is position of word part : this can be beginning of the word (prefix) or middle part.
K-Diff Search is search by approximate matching.
- IBM LanguageWare:
LanguageWare is a software component that provides linguistic processing for a variety of products and solutions in more than 20 languages. It comprises a Java library with a set of language resources. The library encodes the language models, and the resources (dictionaries) encode the lexical entries for each language and contain language-specific processing logic, such as logic for handling decomposition, spelling correction, morphology, hyphenation, language identification, etc.
- IBM LanguageWare Miner for Multidimensional Socio-Semantic Networks:
IBM LanguageWare's library for lexical analysis, disambiguation and ontology/multi-dimensional network based semantic analysis and information mining.
- ISIS:
Indian Scripts Input System (ISIS) is a set of easy-to-use, mnemonic software keyboards for Indian scripts. ISIS is Unicode-compliant and covers almost all major Indian scripts with a single keyboard layout.
- KMap IME:
KMap IME is an os-independent inputmethod. Using any keyboard layout, it currently supports: Arabic, Armenian, IPA, Aymara, Azeri, Belarusian, Bengali, Berbere, Breton, Bulgarian, Catalan, Cherokee, Cimbrian, Comanche, Croatian, Czech, Dakelh, Danish, Devanagari, Dutch, Esperanto, Estonian, Ethiopic, Faroese, Farsi, Finnish, French, Georgian, German, Greek, Guarani, Gurmukhi, Hanunoo, Hawaiian, Hebrew, Hungarian, Icelandic, Inuktitut, Kannada, Kazakh, Latvian, Lithuanian, Malayalam, Maori, Korean, Mongolian, Nahuatl, Navajo, Norvegian, Occitan, Ogham, Oriya, Persian, Piemontese, Polish, Romanian, Russian, Sanskrit, Serbian, Slavic, Slovenian, Spanish, Syriac, Tamil, Telugu, Thai, Tibetan, Ukrainian, Urdu, Vietnamese, Welsh, Yiddish
- KickKeys:
KickKeys is a software tool for linguists. It allows the user to write any language using the regular English computer keyboard without memorizing difficult key sequences. It allows transliteration (type-as-you-pronounce) and remapping of keyboards. Thus if the user is interested to write Ancient Greek or Hebrew in an English Windows, KickKeys is the only choice.
Kickkeys allows the user to specify his/her own key mapping, change existing ones, use any font he/she likes and, to top it all, it allows the user to use these features on WordPad, Microsoft Word, Outlook, Outlook Express, Excel, Frontpage, Powerpoint, Eudora and other common Windows applications.
It ships ready with key maps and fonts for several languages like Assamese, Bengali, Bulgarian, Belarusian, French, Farsi, German/Scandinavian, Hindi, Italian, Portuguese, Russian, Spanish, Tamil and Ukrainian.
It also comes with graphical tools that allow the user to build keymaps for all other languages and fonts.
It even supports typing right to left languages like Farsi in English Windows.
- LCard at Lingresource.Com:
LCARD - A BRAND NEW INFO PRESENTER!
LCard Is a Unique Aid to Organize and Present Your Information
LCard is software specifically designed and has its target audience – humanists and classical scholars including students writing their end-of-term course papers or diplomas, teachers and scientists writing PhD theses, articles and monographs, and everybody in need of presenting homogeneous information (lists of literature, authors or sources). Translators and translation studies experts will especially benefit from this program because they specifically need to always present a original-translation pair cards for analyzing texts.
- LIWC - Linguistic Inquiry and Word Count:
LIWC calculates the percentage of words within each file along 72+ dimensions. Categories include negative emotions (including anger, anxiety, sadness), positive emotions, cognitive processing, standard linguistic dimensions (pronouns, prepositions, articles), and common content categories (death, sex, occupation, etc). This is a sound program from a psychometric perspective -- both in the creation of categories and the validation of the dictionaries. Dictionaries in English, Spanish, German, Dutch, Italian, and Norwegian are available; partial dictionaries in Korean, Hungarian, and French,
- Lextools:
Lextools is a package of tools for creating weighted finite-state transducers from high-level linguistic descriptions.
- LingPipe Java API:
LingPipe is a Java API for linguistic processing tasks that include: tokenization, sentence detection, part-of-speech tagging, phrase chunking, entity detection, within document coreference. It also has efficient language model based classifiers, noisy-channel spell correction. Source included.
- Linguist-GRID.org:
Linguist-GRID is a free internet database for grammaticality rating tests. It is intended to make grammaticality rating tests a lot more fun for linguists by automating most of the work: all you have to do is to enter your test sentences into the database, and assign one or more scales on which the sentences should be rated. After that, the test persons will fill out an online form, and Linguist-GRID will store their ratings in the database and create a test report that describes the data and an analysis of them. Everything is published instantly on the internet so that other linguists can see your test setup, your data, and your analyzed results
- MiniJudge:
A free, open-source software tool for designing, running, and analyzing small-scale experiments on linguistic judgments.
- MtRecode:
A Character Conversion program.
- Multilingual, Fast Typing: Shabda-Brahma ET-Feel Word-Storm Processor (SB):
This auto-suggesting, intelligent text-processor is just great for typing in the skeletal, raw form of your text into the PC. As you type the first few letters of a long word (or of a repeated phrase/ clause), SB tries to cleverly guess the intended word/ phrase and displays that auto-suggestion (2 choices) on the screen. If guessed by SB correctly, you need to just press Insert/ Alt key to get that auto-suggestion auto-typed. Newer auto-suggestions may even be auto-learnt from the user's written texts (along 20 user-paths or 80 languages). Defining & using up to 10,000 direct 3-key shorthand are also possible. Also able to auto-form usual symbols & even South-Asian 'conjunct-consonants' (juktakshar), and displaying font-specific onscreen keyboards, SB is even more useful for non-English typing. SB exports its typed text as HTML-output, to be copied using any Internet-browser into any word-processor (for final use therein). SB v5.8.3 is simpler, faster and practically a freeware!
- MyFontKeys:
This software allows you to use any fonts and customize the keyboard within any application.
- Online text summarization for french texts:
As a software of automatic text summarization, Pertinence gives the possibility to users to reach easily and quickly to the extraction of the important textual information . Pertinence acts as on- and off-ramps to the information superhighway, allowing friendly access to the relevent information. The convenience provided by Pertinence is essential to several tasks, such as for effectively accessing very large and unstructured databases such as the World Wide Web, or an Intranet or own text databases stocked in a computer.
Pertinence demo is free ( french texts) for Document types : ASCII, HTML, PDF
TRY FREE ONLINE Automatic text summarization : http://www.pertinence.net/register_en.html
- Personality Recognizer:
The Personality Recognizer is a Java command-line application that reads a set of text files and computes estimates of personality scores along the Big Five dimensions: extraversion, emotional stability, agreeableness, conscientiousness and openness to experience. The program is based on statistical models that were shown to predict personality scores significantly better than a constant baseline (Mairesse & Walker, 2006).
- R-Varb implemented in R:
Those looking for statistical software should note that R is useful for many things in addition to corpus linguistics. I have used R or the proprietary program on which it is modelled, S, for phonetics since 1982. There is also a Varbrul-like variable rule package called R-Varb implemented in R:
http://ella.slis.indiana.edu/%7Epaolillo/projects/varbrul/rvarb/
- RAM 4.0:
A major upgrade has appeared for the Reading Acceleration Machine, a freeware tachistoscope for Windows. New features in version 4.0 include random review, automatic random review, looping, multiple bookmarks, an acceleration function, the ability to copy particular words or lines displayed to another file, and larger possible time-settings.
- Reading Acceleration Machine:
Tachistoscope for flashing lines of text to the monitor screen at the exact rate desired.
- Redet:
Redet is a tool for performing regular expression matching and substitution. It is useful for performing complex searches of corpora and lexica as well as for transforming data. It permits the user to define named character classes and to take their intersection, with the result that it is possible to run searches on feature matrices. It provides considerable assistance for the user, including a palette of regular expression constructions, a history list that persists across sessions, extensive help, and a set of character entry tools including IPA charts and a simple facility for defining custom character charts. Numerous aspects of the program are configurable. Unicode is fully supported.
- SenseClusters:
SenseClusters is a suite of Perl programs that supports unsupervised clustering of similar contexts. It relies on it's own native methodology, and also provides support for Latent Semantic Analysis.
SenseClusters is a complete system that takes users from preprocessing of text to clustered output. It supports the selection of features, the creation of various kinds of context representations, dimensionality reduction via Singular Value Decomposition, clustering, and analysis of results.
- SignStream:
A Macintosh application to assist with the linguistic analysis of video-based language data.
- System Quirk:
The System Quirk family of applications are designed to aid in the production and maintenance of texts and terminologies. These applications are of specific relevance to computational linguists and language engineers.
- TeXShop:
A TeX previewer for Mac OS X, written in Cocoa.
- Teubingen German Language Resources:
The Tuebingen Treebank of Written German (TueBa-D/Z) - a manually annotated, German newspaper corpus based on data taken from the daily issues of the 'die tageszeitung' (taz) ranging from May 3rd to May 7th 1999.
2. The Tuebingen Partially Parsed Corpus of Written German (TuePP-D/Z) - a collection of articles from the taz newspaper which have been automatically annotated with clause structure, topological fields, and chunks, in addition to more low level annotation including parts of speech and morphological ambiguity classes.
- TextCat Language Guesser:
Determines the language of a given text. Supports more than 65 languages. Free.
- Textanz:
Textanz builds a list of word and phrase frequencies from text. This information allows you to detect excessive use of words and expressions. Such a stylistic control is not less important, than become already the standard spell checking function. Especially advisable is to check business documentation. The first impression that reader gets from your commercial offer, project, resume, contract, report , etc. in many respects depends on writing style. It is also useful to analyze frequencies in your informal writing, generally in any text which you assume to give someone for reading.
When you are in a role of reader, Textanz will help again. Most often used phrases will prompt, what idea was main for the author at the moment of writing, and probably reveal implicit psychological aspects. Word frequency list is a part of so-called stylistic portrait of the writer. In linguistics research, this is often used for identification of authorship (something similar to handwriting).
Developers and a web-master also can find advantage in Textanz , when choosing keywords for web-page or search for repeatable fragments of program source code.
- Topicalizer:
Topicalizer is a text analysis, topic extraction and keyword analysis tool. Based on methods of
computational linguistics it provides various analyses for a given URL or plain text. These comprise,
amongst others, language recognition, lexical density, keywords, collocations, word and phrase
frequencies, readability and a short abstract.
Topicalizer also is able to find similar pages according to the keywords it has extracted from a
document.
Moreover, Topicalizer provides an API for use by external applications.
- Unicode-Keyboard-Layouts for Win2k/XP:
Multilingual keyboard-layouts for all latin-writing and cyrillic European languages and IPA, based on the standard German keyboard-layout. Easy to install. Note: All descriptions are only in German.
- Wazéma Ethiopian Computer Writing System:
Wazéma System is a Windows (9x/Me/NT/2000/XP) and Apple Macintosh (System7-Mac OS8-9.5) compatible computer writing system for Amharic and all Ethiopian languages. It is freely available from:
http://members.aol.com/W4z5m4/wazema.html
The system includes a keyboard system based on the Ethiopian syllabary, six professional quality True Type font families, gemination marks, the full musical notation of the Ethiopian Orthodox Tewahdo Church, etc.
- WordMetry:
WordMetry is a tool for analysis of word statistics, stylometrics, author identification, corpus linguistics,opinion poll,media focus, and prediction. It supports web-based text retrieval and analysis as well as traditional locally-based static text statistics.
- World Language Mapping System:
Data set of worldwide language homeland areas (polygons) and point locations for use in Geographic Information Systems (GIS). Dataset developed jointly by SIL and GMI maps all languages of the 14th Edition Ethnologue, and includes substantially all of the data of the published Ethnologue as GIS attribute fields.
- jMRC - MRC Psycholinguistic Database Java Interface:
jMRC is a Java interface for querying the MRC Psycholinguistic Database, which allows you to get psycholinguistic information about more than 150,000 words over 14 linguistic features. Features include concreteness, familiarity, frequency of use, age of acquisition, number of phonemes, etc.
- unidesc:
This package consists of four programs for finding out what is in a Unicode file. They are useful when working with Unicode files when one doesn't know the writing system, doesn't have the necessary font, needs to inspect invisible characters, needs to find out whether characters have been combined or in what order they occur, or needs statistics on which characters occur.
|
Software: Parsers
|
- CLaRK - an XML-based System for Corpora Development:
CLaRK is an XML-based software system for corpora development. The main aim behind the design of the system is the minimization of human intervention during the creation of language resources.
- Grammar Play:
Grammar Play is a syntactic parser in Prolog for Brazilian Portuguese. The parser analyses simple declarative sentences (i.e. sentences containing only one verb) in Brazilian Portuguese, providing them their respective constituent structure, showing it in the form of labeled brackets and a PS structure. The grammar of Grammar Play was developed based on the model proposed by the X-bar syntax theory. This grammar was implemented in Prolog and the graphic interface of the parser was built using Delphi.
- Link Parser v3.0:
A syntactic parser of English, based on link grammar. Online demonstration, documentation, and downloadable software and API.
- TiGer Search:
Tools for linguistic text exploration; also for Mac OS X
|
Software: Phonetic Analysis
|
- AGTK:
An annotation graph toolkit. Also available for Mac OS X.
- AlteruPhono:
AlteruPhono is an open-source software for developing and testing models of regular phonetic sound changes, simulating the diacronic evolution of a word. Its rules are developed using the common combination of features such as point of articulation and vowel roundness.
- Audiamus:
Audimaus builds a corpus of linked text and media. It is a cross-platform tool that allows presentation of textual material linked to unsegmented media files, using quicktime to instantiate links. It was developed as a means of working interactively with field recordings and of presenting texts and example sentences as playable
media with a dissertation.
- Bust A Vowel:
Educational software that helps to learn the whole set of human vowels, using high quality sound and IPA characters.
It's perfect for students of phonology and phonetics.
This software is 100% freeware with no limitations.
- FreP: Frequency in Portuguese:
FreP is an electronic tool that allows the extraction of frequency information of Portuguese phonological units at the word-level and below. It runs on written texts, following the current orthographic conventions. FreP was conceived as a public domain tool, with the restriction of being used for scientific, non-commercial, purposes.
FreP emerged from a joint (ongoing) project involving Marina Vigário (Univ. Minho), Fernando Martins (Univ. Lisboa/ILTEC) and Sónia Frota (Univ. Lisboa), which started in July, 2004.
To get/update FreP, please write to fmartins@fl.ul.pt .
- Interactive Sagittal Section:
Displays sagittal sections and IPA transcriptions for user-specified lip and tongue positions, using JavaScript.
- NORM: A Vowel Normalization Suite:
NORM is a web-based vowel normalization and plotting package. NORM allows users to normalize formant data using a wide variety of published procedures (Nearey, Lobanov, a Bark difference method, etc). The processing is implemented in R and the R script is available for download and customization.
- Phono:
Software tool for creating and testing models of regular historical sound change.
This version (4.1) runs on Windows (Version 3.3 was for DOS).
- R-Varb implemented in R:
Those looking for statistical software should note that R is useful for many things in addition to corpus linguistics. I have used R or the proprietary program on which it is modelled, S, for phonetics since 1982. There is also a Varbrul-like variable rule package called R-Varb implemented in R:
http://ella.slis.indiana.edu/%7Epaolillo/projects/varbrul/rvarb/
- Sanchay:
Sanchay is an open source platform for working on languages, especially South Asian languages, using computers and also for developing Natural Language Processing (NLP) or other text processing applications. It consists of various tools and APIs for this purpose. It is still in the development stage and the design has not yet stabilized, but components like a text editor with customizable support for languages and encodings, annotation interfaces, etc. was first released as an experimental version (0.1) on Sourceforge.net. The next version (0.2) has been available on the Internet and has also been released on Sourceforge.net, along with the latest version (0.3). It is meant to be complementary to the other existing NLP tools and libraries.
Some of the components in the released version are: Syntactic annotation interface, generalised table and tree components, SSF (Shakti Standard Format) API, feature structure API, parallel corpus markup interface, customizable language and encoding support, Sanchay text editor, language and encoding identification, file splitter and format converter, task setup generator (only for syntactic annotation), a simple but powerful data structure called Properties Manager along with a GUI for purposes like customization of applications, a find/replace/extract tool, a CRF based automatic annotation tool, and a tree visualizer for phrase structure and dependency relations. User documentation has been provided for some of these components. More will be added soon. Some API doc umentation for programmers will also be provided later.
Many other components are in the pipeline. Hopefully other people will get involved with the development so that Sanchay can provide much needed support for South Asian languages for as many purposes as possible.
Sanchay has an object oriented architecture where the emphasis is on a design based on things like modularity, reusability, extensibility and maintainability. The implementation is purely in Java, which means it is platform independent and can be used on Windows as well as Linux without needing any extra setup except installing JDK or JRE.
- SndBite:
SndBite is a specialized audio editor, designed for breaking large
recordings into smaller components with great efficiency. Special features include:
*Multiple simultaneous views of the waveform at different resolutions.
*The ability to position window edges at transitions between sound and silence.
*Automated setting of cut points at zero-crossings.
*Automatic filename generation easily controlled by the user.
*Optional automatic playback on window motion.
*Logging of each write.
- Texai Lexicon:
The Texai lexicon is a merging of WordNet 2.1, the CMU Pronouncing Dictionary, Wiktionary, and the OpenCyc lexicon. The format is RDF, N3 or TriG. Included are entries for lemmas, word forms, word senses, sample phrases and ARPABET pronunciations. A documentation file is available as a separate download.Only the TriG version contains context.
- WaveSurfer:
A tool suited for a wide range of tasks in speech research and education.
- pause:
Pause determines the location of silences in a audio file for use in fragmentation of large recordings, studies of pause duration, and the like.
|
|
Software: Taggers
|
- A tagger for German:
A tagger for German (with interactive online demo version).
- Adsotrans Chinese-English Annotation Engine:
Adsotrans is a collaborative open source Chinese-English annotation project designed to assist learners of Chinese as a second language. It comes with a large database of semantically-tagged Chinese word information.
- CLAWS part-of-speech tagger:
POS tagging software for English text, CLAWS (the Constituent Likelihood Automatic Word-tagging System).
- CLaRK - an XML-based System for Corpora Development:
CLaRK is an XML-based software system for corpora development. The main aim behind the design of the system is the minimization of human intervention during the creation of language resources.
- WordStat:
WordStat is a text analysis module specifically designed to study textual information such as responses to open-ended questions, interviews, titles, journal articles, public speeches, electronic communications, etc. WordStat may be used for automatic categorization of text using a dictionary approach or text mining. WordStat can apply existing categorization dictionaries to a new text corpus. It also may be used in the development and validation of taxonomies. When used in conjunction with manual coding, this module can provide assistance for a more systematic application of coding rules, help uncover differences in word usage between subgroups of individuals, assist in the revision of existing coding using KWIC (Keyword-In-Context) tables, and assess the reliability of coding by the computation of inter-raters agreement statistics. WordStat includes numerous exploratory data analysis and graphical tools that may be used to explore the relationship between the content of documents and information stored in categorical or numeric variables such as the gender or the age of the respondent, year of publication, etc. Relationships among words or categories as well as document similarity may be identified using hierarchical clustering and multidimensional scaling analysis. Correspondence analysis and heatmap plots may be used to explore relationship between keywords and different groups of individuals.
|
Software: Transcription
|
- AGTK:
An annotation graph toolkit. Also available for Mac OS X.
- AX - all accents with one key:
A tiny, free, open source, Windows utility that uses just a single key to generate any accent or special character (within the limits of the character set being used). It is supplied with eleven European languages and can be easily re-configured. It takes moments to learn instead of forever fiddling with special keyboard codes or key combinations.
- Audiamus:
Audimaus builds a corpus of linked text and media. It is a cross-platform tool that allows presentation of textual material linked to unsegmented media files, using quicktime to instantiate links. It was developed as a means of working interactively with field recordings and of presenting texts and example sentences as playable
media with a dissertation.
- EXMARaLDA:
A system and toolset for creating, managing and analysing corpora of transcriptions of spoken language. Consists of an editor for transcriptions in musical score notation, a corpus manager and a search tool.
All file formats are XML based which maximizes exchangeability and archiveability. Many import and export functionalities (Praat, ELAN, AGTK, RTF, HTML, SVG etc.).
- IPAKLICK:
a freely accessible tool that makes it easy to insert strings of IPA-symbols (Unicode) into a text.
- IPANow! Software:
IPANow! by PhoneticSoft is a powerful yet simple tool that automatically transcribes Latin, Italian, German and French texts into International Phonetic Alphabet (IPA) symbols by applying rules utilized by scholarly lyric diction textbooks. Simply type or paste in a text, and with the click of a button IPANow! produces an IPA transcription underneath each line of text that can then be exported in Rich Text Format (.rtf)
IPANow! is designed as a lyric diction resource for choral conductors, professional vocalists, church musicians and music educators, but anyone can use it. IPANow! allows choral directors to easily produce professional-looking phonetic transcriptions of foreign language texts to distribute to choir members.
- Phonetics Builder:
A simple to use and free application to insert Phonetic characters into your documents,worksheets or lesson plans. Phonetics Builder can also be used to correctly format Pinyin for insertion into documents.
- Phonmap - Phonemic script writer:
Easily add phonemic script to Windows documents. No more searching font tables.
- Transformer:
The Transformer is a tool to convert between the file formats of various annotation programs (Praat, ELAN, Transcriber, CLAN, Transana (in preparation)). Transcript files can also be transformed into various outputformats for publication like simple text, soziogram an partiture. Some features are automatic calculation of pauses, selecting which speakers to include in the ouput,...
An english version is available soon.
- WordStat:
WordStat is a text analysis module specifically designed to study textual information such as responses to open-ended questions, interviews, titles, journal articles, public speeches, electronic communications, etc. WordStat may be used for automatic categorization of text using a dictionary approach or text mining. WordStat can apply existing categorization dictionaries to a new text corpus. It also may be used in the development and validation of taxonomies. When used in conjunction with manual coding, this module can provide assistance for a more systematic application of coding rules, help uncover differences in word usage between subgroups of individuals, assist in the revision of existing coding using KWIC (Keyword-In-Context) tables, and assess the reliability of coding by the computation of inter-raters agreement statistics. WordStat includes numerous exploratory data analysis and graphical tools that may be used to explore the relationship between the content of documents and information stored in categorical or numeric variables such as the gender or the age of the respondent, year of publication, etc. Relationships among words or categories as well as document similarity may be identified using hierarchical clustering and multidimensional scaling analysis. Correspondence analysis and heatmap plots may be used to explore relationship between keywords and different groups of individuals.
|
Software: Concordances
|
- A Simple Concordance Program:
Windows based program for creation of wordlists and concordances.
- Apple Pie Parser:
MonoConc Pro 2.0 and MonoConc 1.5: Two concordance programs for linguists and other language researchers.
- CLaRK - an XML-based System for Corpora Development:
CLaRK is an XML-based software system for corpora development. The main aim behind the design of the system is the minimization of human intervention during the creation of language resources.
- Conc:
Concordance software for the Macintosh, developed by the Summer Institute of Linguistics.
- Concordance - the program:
Flexible text analysis software. Lets you gain better insight into e-texts. Make concordances, word lists, indexes. Count word frequencies, find phrases, and more. Publish results to the Web with one click. For Windows XP/2000/NT/ME/98/95
- DictMaker:
Lets you create dictionaries. For Mac OS X.
- WordSmith Tools:
Part of a concordance on hands using Guardian newspaper text as the source.
- WordSmith Tools:
A suite of pc software for lexical analysis of corpora in a very wide variety of languages. Offers oncordancing, wordlisting, key words analysis and a number of other utilities. WordSmith 3.0 (OUP, 1999) handles Windows 3.1 and better and is restricted to Ascii/Ansi text; WS 4.0 (2002) requires Windows 98B or better and handles Unicode as well as Ascii/Ansi text.
Version 4.0 was issued in 2004. This is a complete new edition with many limitations removed and numerous additional features, such as sound concordancing, use of Unicode, tools for obtaining text from the Internet, etc.
- aConCorde:
aConcorde is a multi-lingual concordance tool. Originally developed for native Arabic concordance, it posses basic concordance functionality, as well as English and Arabic interfaces. Written in Java, so will run on any platform that has the Java Runtime Environment installed.
|
Software: Software Directories
|
- Bookmarks for Corpus-based Linguistics:
Links to corpus-based computational linguistics software.
- Corpus-based Computational Linguistics:
Many links to corpus-based computational linguistics software.
- Index of linguistics software:
Linguistic software from the University of Michigan
- TeX/LaTeX Information:
A brief and useful overview for linguists interested in using LaTeX.
- The Humbul Humanities Hub resource for linguists:
A large link directory for linguistics in general. Can be successfully searched for
- Toolbox for linguistic research:
Toolbox for linguists, covering: ICT tools (office applications, data visualization, databases), web tools (social bookmarking, bookmarking of bibliographic information, research wikis), biblio tools (information about bibliographic databases relevant for linguists, bibliographic management tools), linguistics (linguistic journals) and corpus linguistics.
- WordNet:
An on-line lexical reference system. English nouns, verbs, adjectives and adverbs are organized into synonym sets, each representing one underlying lexical concept.
|
Speech Analysis (including Clinical Speech Analysis)
|
|
|
|
Page Updated: 21-Nov-2008

Please report any bad links or misclassified data
LINGUIST Homepage | Read
LINGUIST | Contact us

While the LINGUIST List makes every effort to ensure the linguistic relevance of sites listed on its pages, it cannot vouch for their contents.
|
|