The aim of the SIGIL project is to offer a very gentle introduction to the principles and methods of statistical inference, aimed at linguists and computational linguists with little mathematical background. Our approach balances theoretical explanations – often supported by simulation experiments – with practical hands-on work using the R statistical environment.
The materials provided on this page have evolved from introductory courses that we taught in various locations, together as well as separately. We plan to write an extended manuscript covering a broader range of statistical methods, but we do not have a fixed schedule for this spare-time project. There is also an R software package (with convenience functions for frequency tests and some example data sets) that is scheduled for a major redesign.
If you have a recent version of R, you can download our
corpora package (version 0.3-2) with the built-in package installer. The current version was used for introductory courses previously taught by the authors and provides specialised functionality and some data sets for this purpose. We hope to make a more generally useful package available before the end of 2008.
SIGIL is a spare-time project of Marco Baroni (CIMeC, University of Trento) and Stefan Evert (CogSci, University of Osnabrück).