'Alfa' is a scholarly review of linguistics published by the Lingustics Department of the Universidade Estadual do São Paulo, Brazil, carrying articles in a range of subfields of linguistics. The periodical was first launched in 1962, and it was its 2005 volume (vol. 49) that first became available online. Since then, the free, online publication of both of its annual issues has been uninterrupted. All of its articles are available in PDF format, and are written in Portuguese, although the contents page of each issue provides the English translation of the articles' titles. The articles are authored by an international body of scholars, and they take their subject matter from the widest range of linguistic issues, including a discussion of theoreticians like Saussure and Bakthin, but also questions pertaining to applied linguistics such as the role of games in language classrooms. The site also contains a search function enabling searches for volume, author and keyword. This website is particularly suited for researchers from any language background interested in theoretical linguistics, and its applied lingustics-related articles are particularly relevant to Portuguese linguists.
The website of the English-Norwegian parallel corpus (ENPC) offers information about the ENPC project and the corpus itself. The corpus was developed at the Department of British and American Studies of the Universitetet i Oslo (University of Oslo), and consists of original Norwegian texts and translations from and into English. It is intended for contrastive analysis of the two languages and translation studies. More detailed information about the corpus can be found in the ENPC manual, available on the site. The purpose of the manual is to describe the structure and explain the markup in the corpus. The ENPC manual starts with a description of the corpus, its aims and collection development policy, and proceeds to an explanation of its markup. The document has a chapter on tags used for linguistic analysis, including the markup for direct speech and thought, and word-class tagging. The manual also provides a description of the software written for the project, namely the Translation Corpus Aligner, which aligns texts automatically at the sentence level, and the Translation Corpus Explorer, which is a browser for parallel texts. The manual offers a list of texts included in the corpus and a list of word-class elements allowed by the ENPC DTD with notes on their usage. Links to publications (until 2001) and people involved with the project can also be found on the site together with links to extensions of the project. The encoding behind the corpus is in broad agreement with the TEI Guidelines, though the ENPC DTD differs from the TEI DTD in some respects, mainly through the addition of new tags and entities (all modifications to the TEI DTD are described in Appendix 3 to the document). The chapter on markup includes a detailed description of the encoding recommended for the header, text and its divisions, paragraphs, S-units, words, headings, punctuation, highlighting and quotation, foreign elements, notes, lists, figures, editorial comments, links and other textual elements.
Although the site is no longer updated, the information remains relevant.
This website details the ongoing Historical Thesaurus of English (HTE) project. It describes the project itself, how the finished work will be organised, and lists publications that have benefited from the work on the thesaurus so far. The site also provides some sample entries, such as 'beer' and 'gin'.The HTE contains English words (including Old English) from their earliest written occurrence, giving information on when they fell out of use (where appropriate and known). It is based on the New Oxford English Dictionary. The HTE is organised into three sections: the External World, the Mind, and Society. Within each section, words are ordered chronologically and semantically (not alphabetically). The HTE allows the building of models of vocabularies available at any one time, and it should be a valuable research tool for studying literary and linguistic history. The project received funding from the Arts and Humanities Research Board (AHRB) within the Research Grants scheme.
The International Corpus of English (ICE) website presents a corpus compilation project that aims to provide comparable corpora of English from different English-speaking regions around the world. Each corpus will contain one million words of spoken and written language, taken from a wide range of sources and situations. There is a common corpus design that is being used by every compilation team, and a common scheme for grammatical annotation, thus ensuring compatibility between the corpora. The site describes the corpus design and annotation schemes and provides information about the different ICE teams, including information about the different varieties of English, bibliographical references and related links. As of January 2008, the following corpora are available for download: Hong Kong; East Africa; India; Philippines; and Singapore. The corpora from Great Britain and New Zealand are available on CD. Sample sound files can be found on the website.
The Italnet project consists of two major collections of interest to linguists: the Opera del Vocabolario Italiano and FIOLA, the Franco-Italian online archive. In addition to these, the website provides links to: the International Gramsci Society home page and online journal; the inventory catalogue of the drawings in the Biblioteca Ambrosiana, Milan; and the website of the exhibition Renaissance Dante in print (1472-1629). The Opera del Vocabolario Italiano is a database of early Italian writing, including works written before 1375 (the year of Boccaccio's death). It currently contains approximately 2,000 documents, including the prose and poetry of Dante, Petrarch, Boccaccio, and other less famous poets, and also merchants' records and medieval chronicles.
The collection totals over 21 million running words, and around 480,000 distinct lexical forms. The texts have been classified by genre, and information is also available on their date of composition and linguistic area. The collection is available as a searchable database over the Internet, provided the user is registered with ItalNet, or they are accessing the database via an ARTFL subscribing institution. One can search for single and multiple words and phrases across the whole collection, or limit searches to single authors and works, time periods and linguistic area. Results are available as detailed concordance or keyword-in-context (the latter showing a single line of text only for each occurrence). For each occurrence, an abbreviated reference is given (indicating page numbers), and a full bibliography is attached at the end of the results. In addition, results can also be obtained as a table listing the number of occurrences of the keyword/phrase and the reference, in descending order of popularity. This is expressed as a simple count rather than a percentage. Depending on one's Web browser, one may print off or save results as HTML or plain text files. It is not possible to access the full-text of any single work contained in the Opera del Vocabolario Italiano.
FIOLA - Franco-Italian online archive is a new and at present very small collection of texts written in a mix of French and Italian. It currently contains only two documents: 'La Guerra di Attila' and 'l'Entrée d'Espagne', though further texts are being prepared for inclusion. It will concentrate on works written between the 12th century and the Renaissance. The collection is available as a searchable database. One can look for words or phrases. Results give a count for all occurrences, and concordance. It is possible to browse through the full-texts of FIOLA, though for this facility an ARTFL username and password must first be obtained.
JRC-Acquis Multilingual Parallel Corpus is a collection of European Union legal texts, in 22 of the member state's languages, that has been aligned and coded in XML, providing an invaluable tool for linguistic research and a resource for computational linguistic applications. The corpus consists of a selection of texts from the Acquis Communautaire (AC), the total body of European Union (EU) law, applicable in the the EU member states and contains some 636 million word tokens. The languages included are, Bulgarian; Czech; Swedish; German; Greek; English; Spanish; Estonian; Finnish; French; Hungarian; Italian; Lithuanian; Latvian; Maltese; Dutch; Polish; Portuguese; Romanian; Slovak; Slovene; and Danish. The language pairs have been aligned automatically, using two different sets of software and is not proof read by humans. The texts are legal documents from different countries expressing EU legislation. The texts, are thus, not necessarily translations of each other. For example, the sub corpus of aligned Finnish and Maltese texts are most likely not translations of each other but rather translations or interpretations of a separate original text. They are still parallel texts useful for translation studies or comparative studies. The complete corpus, as separate texts in different languages or aligned language pairs, in two version, is downloadable from the site. In addition there is a biography of publications concerning the project, where some articles are downloadable as PDF-files. This makes this a valuable tool for anyone interested in translation studies, comparative linguistics or European languages in general.
The website associated with the Newcastle Electronic Corpus of Tyneside English (NECTE) describes a project aiming to improve access to and promote the re-use of dialect recordings made in the Newcastle conurbation between 1969 and 1994. The original corpus consisted of 86 loosely-structured interviews, most of which were subsequently phonetically and orthographically transcribed. Interviewees were drawn from a sample of the population of Gateshead in North-East England, spanning various social classes and age groups, and were encouraged to talk about their life histories and their attitudes to the local dialect. The more recent corpus (the ESRC-funded Phonological Variation and Change in Contemporary Spoken English), recorded in the early 1990s, set out to examine salient patterns of phonological variation and change in contemporary spoken British English, focusing on localised versus non-localised patterns of change. The NECTE project has amalgamated the two corpora and created the first TEI-conformant electronic vernacular corpus in a range of formats (sound files as well as phonetic and orthographic transcriptions that are also part-of-speech tagged). The site provides documentation about: the original resources and the NECTE team's enhancement of them; information about the people involved; publications resulting from the project; references; links; and appendices. The transcription and the audio files themselves are not accessible online. The site should be of use to anyone interested in Geordie dialect, linguistics, sociology, sociolinguistics, and the local public interested in changes in Tyneside expressions, folklore and reminiscences. The project was funded by the AHRC under its Resource Enhancement scheme. The resource can also be downloaded in XML format from the Oxford Text Archive (OTA) website (formerly part of the Arts and Humanities Data Service (AHDS)).
This Web page describes the project Recent Grammatical Change in British and American English: A Corpus-based Approach conducted by Professor Geoffrey Leech of the University of Lancaster. The site lists his publications on the subject, and describes the Brown family of corpora of British and American written English - The Brown Corpus, The LOB (Lancaster-Oslo/Bergen) Corpus, The FROWN (Freiburg-Brown) Corpus and the FLOB (Freiburg Lancaster Oslo/Bergen) Corpus, used in the project. It aims to chart and analyse changes in frequency in the use of the English language within the thirty-year period 1961-1991. The focus is on areas of change occurring in the usage of modal auxiliaries, semi-modals, aspect, tense and mood and other areas such as noun phrase categories, questions and punctuation. The findings are described on the site and compared to provisional findings regarding spoken English. The project received a Research Grant from the Arts and Humanities Research Board (AHRB).