Publications

In case you've been wondering how to cite my papers after my transition, please use my name as specified in the original paper (and in the listing below). I do not want to hide my personal history and the fact that I spent 50 years trying to live as a man. You also might consider switching to a citation style such as APA that only uses initials for given names.

Monographs

Evert, Stefan (2004, published 2005). The Statistics of Word Cooccurrences: Word Pairs and Collocations. Dissertation, Institut für maschinelle Sprachverarbeitung, University of Stuttgart, URN urn:nbn:de:bsz:93-opus-23714. [official version, PDF, companion website]

Hoffmann, Sebastian; Evert, Stefan; Smith, Nicholas; Lee, David; Berglund Prytz, Ylva (2008). Corpus Linguistics with BNCweb - a Practical Guide, volume 6 of English Corpus Linguistics. Peter Lang, Frankfurt am Main. [ordering information]

Journal Papers

Adrian, Axel; Dykes, Natalie; Evert, Stephanie; Heinrich, Philipp; Keuchen, Michael (2022). Entwicklung und Evaluation automatischer Verfahren zur Anonymisierung von Gerichtsentscheidungen. Legal Tech, 2022(4), 233–238. [manuscript (PDF), journal homepage]

Dykes, Natalie; Evert, Stefan; Göttlinger, Merlin; Heinrich, Philipp; Schröder, Lutz (2021). Argument parsing via corpus queries. it – Information Technology, 63(1), 31–44. [journal homepage]

Dykes, Natalie; Evert, Stefan; Göttlinger, Merlin; Heinrich, Philipp; Schröder, Lutz (2020). Reconstructing arguments from noisy text. Datenbank-Spektrum. [PDF, journal homepage]

Evert, Stefan; Heinrich, Philipp; Henselmann, Klaus; Rabenstein, Ulrich; Scherr, Elisabeth; Schmitt, Martin; Schröder, Lutz (2019). Combining machine learning and semantic features in the classification of corporate disclosures. Journal of Logic, Language and Information, 28, 309–330. [accepted manuscript (PDF), journal homepage]

Schäfer, Fabian; Evert, Stefan; Heinrich, Philipp (2017). Japan's 2014 general election: Political bots, right-wing internet activism and PM Abe Shinzō's hidden nationalist agenda. Big Data, 5(4), 294–309. [open access (PDF)]

Evert, Stefan; Proisl, Thomas; Jannidis, Fotis; Reger, Isabella; Pielström, Steffen; Schöch, Christof; Vitt, Thorsten (2017). Understanding and explaining Delta measures for authorship attribution. Digital Scholarship in the Humanities. 22(suppl_2), ii4–ii16. [free access (PDF), reference corpus, R code & data sets]

Evert, Stefan; Greiner, Paul; Baigger, João Filipe; Lang, Bastian (2016). A distributional approach to open questions in market research. Computers in Industry, 78, 16–28. [accepted manuscript (PDF), journal homepage]

Lapesa, Gabriella and Evert, Stefan (2014). A large scale evaluation of distributional semantic models: Parameters, interactions and model selection. Transactions of the Association for Computational Linguistics, 2, 531-545. [PDF, supplementary material]

Biemann, Chris; Bildhauer, Felix; Evert, Stefan; Goldhahn, Dirk; Quasthoff, Uwe; Schäfer, Roland; Simon, Johannes; Swiezinski, Leonard; Zesch, Torsten (2013). Scalable construction of high-quality Web corpora. Journal for Language Technology and Computational Linguistics (JLCL), 28(2), 23-59. [PDF]

Ansorge, Ulrich; Reynvoet, Bert; Hendler, Jessica; Oettl, Lennart; Evert, Stefan (2013). Conditional automaticity in subliminal morphosyntactic priming. Psychological Research, 77, 399-421.

Michelbacher, Lukas, Evert, Stefan, and Schütze, Hinrich (2011). Asymmetry in corpus-derived and human word associations. Corpus Linguistics and Linguistic Theory, 7(2), 245-276.

Evert, Stefan (2006). How random is a corpus? The library metaphor. Zeitschrift für Anglistik und Amerikanistik, 54(2), 177-190. [manuscript (PDF), journal homepage]

Carletta, Jean, Evert, Stefan, Heid, Ulrich, Kilgour, Jonathan, and Chen, Yiya (2005). The NITE XML Toolkit: data model and query language. Language Resources and Evaluation, 39(4), 313-334. [NXT homepage, journal homepage]

Evert, Stefan and Krenn, Brigitte (2005). Using small random samples for the manual evaluation of statistical association measures. Computer Speech & Language 19(4), 450-466. [manuscript (PDF)]

Carletta, Jean; Evert, Stefan; Heid, Ulrich; Kilgour, Jonathan; Robertson, Judy; Voormann, Holger (2003). The NITE XML toolkit: Flexible annotation for multimodal language data. Behavior Research Methods, Instruments, & Computers, 35(3), 353-363. [PDF, journal homepage]

Book Chapters

Evert, Stefan (2013). Tools for the acquisition of lexical combinatorics. In R. H. Gouws, U. Heid, W. Schweickard, and H. E. Wiegand (eds.), Dictionaries. An International Encyclopedia of Lexicography. Supplementary volume: Recent Developments with Focus on Electronic and Computational Lexicography (HSK 5.4), chapter 104, pages 1415-1432. Mouton de Gruyter, Berlin, New York.

Evert, Stefan, Frötschl, Bernhard, and Lindstrot, Wolf (2009). Statistische Grundlagen. In K.-U. Carstensen, C. Ebert, C. Ebert, S. Jekat, R. Klabunde, and H. Langer, editors, Computerlinguistik und Sprachtechnologie: Eine Einführung, pages 114-158. Spektrum Akademischer Verlag, Heidelberg, 3rd edition.

Ebert, Christian; Schiehlen, Michael; Klabunde, Ralf; Evert, Stefan (2009). Semantik. In K.-U. Carstensen, C. Ebert, C. Ebert, S. Jekat, R. Klabunde, and H. Langer, editors, Computerlinguistik und Sprachtechnologie: Eine Einführung, pages 330-393. Spektrum Akademischer Verlag, Heidelberg, 3rd edition.

Evert, Stefan (2008). Corpora and collocations. In A. Lüdeling and M. Kytö (eds.), Corpus Linguistics. An International Handbook, article 58, pages 1212-1248. Mouton de Gruyter, Berlin. [extended manuscript (PDF)]

Baroni, Marco and Evert, Stefan (2008). Statistical methods for corpus exploitation. In A. Lüdeling and M. Kytö (eds.), Corpus Linguistics. An International Handbook, article 36, pages 777-803. Mouton de Gruyter, Berlin. [manuscript (PDF)]

Evert, Stefan and Fitschen, Arne (2001). Textkorpora. In K.-U. Carstensen, C. Ebert, C. Endriss, S. Jekat, R. Klabunde, and H. Langer (eds.), Computerlinguistik und Sprachtechnologie - Eine Einführung, pages 369-376. Spektrum Akademischer Verlag, Heidelberg, Berlin.

Conference Proceedings and Collections

2024

Adrian, Axel; Evert, Stephanie; Heinrich, Philipp; Keuchen, Michael (2024). Auslegung des KI-VO-E zur Evaluation von Verfahren der Künstlichen Intelligenz am Beispiel der automatischen Anonymisierung von Gerichtsentscheidungen. In Juristische Sprachmodelle – Tagungsband des 27. Internationalen Rechtsinformatik Symposions IRIS 2024. Editions Weblaw. [manuscript (PDF)] 🥇 LexisNexis best paper award

2023

Adrian, Axel; Dykes, Nathan; Evert, Stephanie; Heinrich, Philipp; Keuchen, Michael (2023). Automatische Anonymisierung von Gerichtsurteilen: Eine Vision scheint realisierbar. In E. Schweighofer, J. Zanol, and S. Eder (eds.), Rechtsinformatik als Methodenwissenschaft des Rechts – Tagungsband des 26. Internationalen Rechtsinformatik Symposions IRIS 2023. Editions Weblaw. [manuscript (PDF), publisher homepage]

2022

Adrian, Axel; Dykes, Nathan; Evert, Stephanie; Heinrich, Philipp; Keuchen, Michael; Proisl, Thomas (2022). Manuelle und automatische Anonymisierung von Urteilen. In A. Adrian, S. Evert, M. Kohlhase, and M. Zwickel (eds.), Digitalisierung von Zivilprozess und Rechtsdurchsetzung. Duncker & Humblot, Berlin.

Blombach, Andreas; Evert, Stephanie; Jannidis, Fotis; Pielström, Steffen; Konle, Leonard; Proisl, Thomas (2022). Exploring lexical diversities. In Digital Humanities 2022: Conference Abstracts, pages 130–133, Tokyo, Japan / online. [PDF]

Dykes, Nathan; Heinrich, Philipp; Evert, Stephanie (2022). Retrieving Twitter argumentation with corpus queries and discourse analysis. In S. Flach and M. Hilpert (eds.), Broadening the Spectrum of Corpus Linguistics. New approaches to variability and change, number 105 in Studies in Corpus Linguistics. John Benjamins. [publisher homepage]

Evert, Stephanie (2022). Measuring keyness. In Digital Humanities 2022: Conference Abstracts, pages 202–205, Tokyo, Japan / online. [PDF, osf.io/cy6mw]

2021

Adrian, Axel; Evert, Stefan; Keuchen, Michael; Heinrich, Philipp; Dykes, Natalie (2021). Anonymisierung von Gerichtsurteilen – Eine wesentliche Voraussetzung für E-Justice. In E. Schweighofer, F. Kummer, A. Saarenpää, S. Eder, and P. Hanke (eds.), Cybergovernance – Tagungsband des 24. Internationalen Rechtsinformatik Symposions IRIS 2021, pages 137–149. Editions Weblaw. [manuscript (PDF), publisher homepage]

Evert, Stefan and Lapesa, Gabriella (2021). FAST: A carefully sampled and cognitively motivated dataset for distributional semantic evaluation. In Proceedings of the 25th Conference on Computational Natural Language Learning (CoNLL 2021), pages 588–595, Online. [PDF, ACL anthology, FAST data set]

Neumann, Stella and Evert, Stefan (2021). A register variation perspective on varieties of English. In E. Seoane and D. Biber (eds.), Corpus based approaches to register variation, chapter 6, pages 143–178. Benjamins, Amsterdam. [online supplement, manuscript (PDF), publisher homepage]

Tayebi Arasteh, Soroosh; Monajem, Mehrpad; Christlein, Vincent; Heinrich, Philipp; Nicolaou, Anguelos; Boldaji, Hamidreza Naderi; Lotfinia, Mahshad; Evert, Stefan (2021). How will your tweet be received? Predicting the sentiment polarity of tweet replies. In Proceedings of the 15th IEEE International Conference on Semantic Computing (ICSC 2021), pages 356–359. [PDF, data set]

2020

Evert, Stefan; Harlamov, Oleg; Heinrich, Philipp; Banski, Piotr (2020). Corpus Query Lingua Franca part II: Ontology. In Proceedings of the 12th International Conference on Language Resources and Evaluation (LREC 2020), Marseille, France. [PDF, GitHub]

Proisl, Thomas; Dykes, Natalie; Heinrich, Philipp; Kabashi, Besim; Blombach, Andreas; Evert, Stefan (2020). EmpiriST corpus 2.0: Adding manual normalization, lemmatization and semantic tagging to a German Web and CMC corpus. In Proceedings of the 12th International Conference on Language Resources and Evaluation (LREC 2020), Marseille, France. [PDF, corpus & resources]

2019

Anthony, Laurence and Evert, Stefan (2019). Embracing the concept of data interoperability in corpus tools development. In Proceedings of the Corpus Linguistics 2019 Conference, Cardiff, UK. [PDF]

Dykes, Natalie; Evert, Stefan; Peters, Joachim; Heinrich, Philipp (2019). Argumentation is key: A keyword-based study of arguments in online discourse. In Proceedings of the Corpus Linguistics 2019 Conference, Cardiff, UK. [PDF, slides]

Proisl, Thomas; Konle, Leonard; Evert, Stefan; Jannidis, Fotis (2019). Dependenzbasierte syntaktische Komplexitätsmaße. In Proceedings of DHd 2019, pages 270–273, Frankfurt & Mainz, Germany. [PDF, poster]

2018

Heinrich, Philipp; Adrian, Christoph; Kalashnikova, Olena; Schäfer, Fabian; Evert, Stefan (2018). A transnational analysis of news and tweets about “nuclear phase-out” in the aftermath of the Fukushima incident. In Proceedings of the 1st Workshop on Computational Impact Detection from Text Data (CIDTD 2018), pages 8–16, Miyazaki, Japan. [PDF, slides]

Proisl, Thomas; Evert, Stefan; Jannidis, Fotis; Schöch, Christof; Konle, Leonard; Pielström, Steffen (2018). Delta vs. n-gram tracing: Evaluating the robustness of authorship attribution methods. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), pages 3309–3314, Miyazaki, Japan. [PDF, slides]

Proisl, Thomas; Heinrich, Philipp; Kabashi, Besim; Evert, Stefan (2018). EmotiKLUE at IEST 2018: Topic-informed classification of implicit emotions. In Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pages 235–242, Brussels, Belgium. [PDF, software]

Uhrig, Peter; Evert, Stefan; Proisl, Thomas (2018). Collocation candidate extraction from dependency-annotated corpora: Exploring differences across parsers and dependency annotation schemes. In P. Cantos-Gómez and M. Almela-Sánchez (eds.), Lexical Collocation Analysis: Advances and Applications, pages 111–140. Springer International Publishing, Cham. [publisher homepage]

2017

Evert, Stefan; Heinrich, Philipp; Henselmann, Klaus; Rabenstein, Ulrich; Scherr, Elisabeth; Schröder, Lutz (2017). Combining machine learning and semantic features in the classification of corporate disclosures. In Proceedings of the Workshop on Logic and Algorithms in Computational Linguistics (LACompLing 2017), pages 47–62, Stockholm, Sweden. [PDF]

Evert, Stefan and Neumann, Stella (2017). The impact of translation direction on characteristics of translated texts. A multivariate analysis for English and German. In G. De Sutter, M.-A. Lefer, and I. Delaere (eds.), Empirical Translation Studies. New Theoretical and Methodological Traditions, number 300 in Trends in Linguistics. Studies and Monographs (TiLSM). Mouton de Gruyter, Berlin. Online supplement: http://www.stefan-evert.de/PUB/EvertNeumann2017/.

Evert, Stefan; Uhrig, Peter; Bartsch, Sabine; Proisl, Thomas (2017). E-VIEW-alation – a large-scale evaluation study of association measures for collocation identification. In Electronic lexicography in the 21st century. Proceedings of the eLex 2017 conference, pages 531–549, Leiden, The Netherlands. [PDF, slides, video, E-VIEW-alation]

Evert, Stefan; Wankerl, Sebastian; Nöth, Elmar (2017). Reliable measures of syntactic and lexical complexity: The case of Iris Murdoch. In Proceedings of the Corpus Linguistics 2017 Conference, Birmingham, UK. [PDF, Slides]

Lapesa, Gabriella and Evert, Stefan (2017). Large-scale evaluation of dependency-based DSMs: Are they worth the effort? In Proceedings of the 15th Annual Meeting of the European Association for Computational Linguistics (EACL 2017), pages 394–400, Valencia, Spain. [PDF, supplementary material]

Proisl, Thomas; Heinrich, Philipp; Evert, Stefan; Kabashi, Besim (2017). Translation inference across dictionaries via a combination of graph-based methods and co-occurrence statistics. In Proceedings of the LDK 2017 Workshops: Shared Task on Translation Inference Across Dictionaries (TIAD), pages 94–102, Galway, Ireland. CEUR. [PDF]

Wankerl, Sebastian; Nöth, Elmar; Evert, Stefan (2017). An n-gram based approach to the automatic diagnosis of alzheimer's disease from spoken language. In Proceeding of INTERSPEECH 2017, pages 3162–3166, Stockholm, Sweden. [PDF]

2016

Beißwenger, Michael; Bartsch, Sabine; Evert, Stefan; Würzner, Kay-Michael (2016). EmpiriST 2015: A shared task on the automatic linguistic annotation of computer-mediated communication and web corpora. In Proceedings of the 10th Web as Corpus Workshop (WAC-X) and the EmpiriST Shared Task, pages 44-56, Berlin, Germany. [PDF, task homepage]

Evert, Stefan (2016). CogALex-V shared task: Mach5 – a traditional DSM approach to semantic relatedness. In Proceedings of the 5th Workshop on Cognitive Aspects of the Lexicon (CogALex-V), pages 92–97, Osaka, Japan. [PDF, system & data]

Evert, Stefan; Jannidis, Fotis; Dimpel, Friedrich Michael; Schöch, Christof; Pielström, Steffen; Vitt, Thorsten; Reger, Isabella; Büttner, Andreas; Proisl, Thomas (2016). “Delta” in der stilometrischen Autorschaftsattribution. In Proceedings of DHd 2016, pages 61–74, Leipzig, Germany. [HTML]

Santus, Enrico; Gladkova, Anna; Evert, Stefan; Lenci, Alessandro (2016). The CogALex-V shared task on the corpus-based identification of semantic relations. In Proceedings of the 5th Workshop on Cognitive Aspects of the Lexicon (CogALex-V), pages 69–79, Osaka, Japan. [PDF, task homepage]

Wankerl, Sebastian; Nöth, Elmar; Evert, Stefan (2016). An analysis of perplexity to reveal the effects of Alzheimer's disease on language. In ITG-Fachbericht 267: Speech Communication, pages 254–259, Paderborn, Germany. [PDF]

2015

Evert, Stefan and Arppe, Antti (2015). Some theoretical and experimental observations on naïve discriminative learning. In Proceedings of the 6th Conference on Quantitative Investigations in Theoretical Linguistics (QITL-6), Tübingen, Germany. [PDF, handout (PDF), slides (PDF)]

Evert, Stefan and Hardie, Andrew (2015). Ziggurat: A new data model and indexing format for large annotated text corpora. In Proceedings of the 3rd Workshop on the Challenges in the Management of Large Corpora (CMLC-3), pages 21-27, Lancaster, UK. [PDF]

Evert, Stefan; Proisl, Thomas; Jannidis, Fotis; Pielström, Steffen; Schöch, Christof; Vitt, Thorsten (2015). Towards a better understanding of Burrows's Delta in literary authorship attribution. In Proceedings of the Fourth Workshop on Computational Linguistics for Literature, Denver, CO. Co-located with NAACL-HLT 2015. [PDF]

Plotnikova, Nataliia; Kohl, Micha; Volkert, Kevin; Lerner, Andreas; Dykes, Natalie; Ermer, Heiko; Evert, Stefan (2015). KLUEless: Polarity classification and association. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), pages 619-625, Denver, Colorado. [PDF, extended report with qualitative evaluation]

Plotnikova, Nataliia; Lapesa, Gabriella; Proisl, Thomas; Evert, Stefan (2015). SemantiKLUE: Semantic textual similarity with maximum weight matching. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), pages 111-116, Denver, Colorado. [PDF]

2014

Bartsch, Sabine and Evert, Stefan (2014). Towards a Firthian notion of collocation. In A. Abel and L. Lemnitzer (eds.), Vernetzungsstrategien, Zugriffsstrukturen und automatisch ermittelte Angaben in Internetwörterbüchern, number 2/2014 in OPAL - Online publizierte Arbeiten zur Linguistik, pages 48-61. Institut für Deutsche Sprache, Mannheim. [PDF]

Diwersy, Sascha, Evert, Stefan, and Neumann, Stella (2014). A weakly supervised multivariate approach to the study of language variation. In B. Szmrecsanyi and B. Wälchli, editors, Aggregating Dialectology, Typology, and Register Analysis. Linguistic Variation in Text and Speech, Linguae et Litterae: Publications of the School of Language and Literature, Freiburg Institute for Advanced Studies. De Gruyter, Berlin. [online access, earlier manuscript (PDF)]

Evert, Stefan (2014). Distributional semantics in R with the wordspace package. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: System Demonstrations, pages 110-114, Dublin, Ireland. [PDF, wordspace homepage]

Evert, Stefan; Proisl, Thomas; Greiner, Paul; Kabashi, Besim (2014). SentiKLUE: Updating a polarity classifier in 48 hours. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval-2014), Dublin, Ireland. [PDF]

Lapesa, Gabriella and Evert, Stefan (2014). NaDiR: Naive distributional response generation. In Proceedings of the 4th Workshop on Cognitive Aspects of the Lexicon (CogALex), pages 50-59, Dublin, Ireland. [PDF]

Lapesa, Gabriella; Evert, Stefan; Schulte im Walde, Sabine (2014). Contrasting syntagmatic and paradigmatic relations: Insights from distributional semantic models. In Proceedings of the Third Joint Conference on Lexical and Computational Semantics (*SEM 2014), pages 160-170, Dublin, Ireland. [PDF]

Proisl, Thomas; Evert, Stefan; Greiner, Paul; Kabashi, Besim (2014). SemantiKLUE: Robust semantic similarity at multiple levels using maximum weight matching. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval-2014), Dublin, Ireland. [PDF]

Schulze Wettendorf, Clemens; Jegan, Robin; Körner, Allan; Zerche, Julia; Plotnikova, Nataliia; Moreth, Julian; Schertl, Tamara; Obermeyer, Verena; Streil, Susanne; Willacker, Tamara; Evert, Stefan (2014). SNAP: A multi-stage XML-pipeline for aspect based sentiment analysis. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), pages 578–584, Dublin, Ireland. [PDF]

2013

Greiner, Paul; Proisl, Thomas; Evert, Stefan; Kabashi, Besim (2013). KLUE-CORE: A regression model of semantic textual similarity. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity, pages 181-186, Atlanta, Georgia, USA. [PDF]

Lapesa, Gabriella and Evert, Stefan (2013). Evaluating neighbor rank and distance measures as predictors of semantic priming. In Proceedings of the ACL Workshop on Cognitive Modeling and Computational Linguistics (CMCL 2013), Sofia, Bulgaria. [PDF]

Proisl, Thomas; Greiner, Paul; Evert, Stefan; Kabashi, Besim (2013). KLUE: Simple and robust methods for polarity classification. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), pages 395-401, Atlanta, Georgia, USA. [PDF]

2012

Boleda, Gemma; Evert, Stefan; Gehrke, Berit; McNally, Louise (2012). Adjectives as saturators vs. modifiers: Statistical evidence. In M. Aloni, V. Kimmelman, F. Roelofsen, G. W. Sassoon, K. Schulz, and M. Westera (eds.), Logic, Language and Meaning. Proceedings of the 18th Amsterdam Colloquium, volume 7218 of Lecture Notes in Computer Science, pages 112-121. Springer, Berlin, Heidelberg. [PDF]

2011

Evert, Stefan and Hardie, Andrew (2011). Twenty-first century corpus workbench: Updating a query architecture for the new millennium. In Proceedings of the Corpus Linguistics 2011 Conference, Birmingham, UK. [PDF]

Ebert, Cornelia; Evert, Stefan; Wilmes, Katharina (2011). Focus marking via gestures. In I. Reich et al. (eds.), Proceedings of Sinn & Bedeutung 15, Saarbrücken, Germany. Universaar - Saarland University Press. [PDF]

2010

Evert, Stefan (2010). Google Web 1T5 n-grams made easy (but not for the computer). In Proceedings of the 6th Web as Corpus Workshop (WAC-6), Los Angeles, CA. [PDF]

2009

Giesbrecht, Eugenie and Evert, Stefan (2009). Part-of-speech tagging - a solved task? An evaluation of POS taggers for the Web as corpus. In I. Alegria, I. Leturia, and S. Sharoff, editors, Proceedings of the 5th Web as Corpus Workshop (WAC5), San Sebastian, Spain. [PDF]

2008

Evert, Stefan (2008). A lightweight and efficient tool for cleaning Web pages. In Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC 2008), Marrakech, Morocco. [PDF]

Evert, Stefan (2008). A lexicographic evaluation of German adjective-noun collocations. In Proceedings of the LREC Workshop Towards a Shared Task for Multiword Expressions (MWE 2008), Marrakech, Morocco. [PDF]

2007

Baroni, Marco and Evert, Stefan (2007). Words and echoes: Assessing and mitigating the non-randomness problem in word frequency distribution modeling. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, pages 904-911, Prague, Czech Republic. [PDF, talk slides (PDF)]

Bauer, Daniel; Degen, Judith; Deng, Xiaoye; Herger, Priska; Gasthaus, Jan; Giesbrecht, Eugenie; Jansen, Lina; Kalina, Christin; Krüger, Thorben; Märtin, Robert; Schmidt, Martin; Scholler, Simon; Steger, Johannes; Stemle, Egon and Evert, Stefan (2007). FIASCO: Filtering the Internet by automatic subtree classification, Osnabrück. In C. Fairon, H. Naets, A. Kilgarriff, and G.-M. de Schrvyer (eds.), Building and Exploring Web Corpora: Proceedings of the 3rd Web as Corpus Workshop (WAC3), incorporating CLEANEVAL, pages 111-121, Louvain-la-Neuve, Belgium. [PDF]

Evert, Stefan (2007). StupidOS: A high-precision approach to boilerplate removal. In C. Fairon, H. Naets, A. Kilgarriff, and G.-M. de Schrvyer (eds.), Building and Exploring Web Corpora: Proceedings of the 3rd Web as Corpus Workshop (WAC3), incorporating CLEANEVAL, pages 123-133, Louvain-la-Neuve, Belgium. [PDF]

Evert, Stefan and Baroni, Marco (2007). zipfR: Word frequency distributions in R. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, Posters and Demonstrations Session, pages 29-32, Prague, Czech Republic. [PDF]

Lüdeling, Anke, Evert, Stefan, and Baroni, Marco (2007). Using Web data for linguistic purposes. In M. Hundt, N. Nesselhauf, and C. Biewer, editors, Corpus Linguistics and the Web, volume 59 of Language and Computers - Studies in Practical Linguistics, pages 7-24. Rodopi, Amsterdam, New York. [manuscript (PDF)]

Michelbacher, Lukas, Evert, Stefan, and Schütze, Hinrich (2007). Asymmetric association measures. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2007), Borovets, Bulgaria. [PDF]

2006

Bernardini, Silvia, Baroni, Marco, and Evert, Stefan (2006). A WaCky introduction. In M. Baroni and S. Bernardini, editors, Wacky! Working papers on the Web as Corpus, pages 9-40. GEDIT, Bologna. [http://wackybook.sslmit.unibo.it/]

Hoffmann, Sebastian and Evert, Stefan (2006). BNCweb (CQP-edition): The marriage of two corpus tools. In S. Braun, K. Kohn, and J. Mukherjee (eds.), Corpus Technology and Language Pedagogy: New Resources, New Tools, New Methods, volume 3 of English Corpus Linguistics, pages 177-195. Peter Lang, Frankfurt am Main. [PDF]

2005

Baroni, Marco and Evert, Stefan (2005). Testing the extrapolation quality of word frequency models. In P. Danielsson and M. Wagenmakers (eds.), Proceedings of Corpus Linguistics 2005, volume 1 of The Corpus Linguistics Conference Series. ISSN 1747-9398. [PDF]

Evert, Stefan and Schönenberger, Manuela (2005). Separating the sheep from the goats: Clarifying corpus content using XML. In P. Danielsson and M. Wagenmakers (eds.), Proceedings of Corpus Linguistics 2005, volume 1 of The Corpus Linguistics Conference Series. ISSN 1747-9398. [PDF]

Krenn, Brigitte and Evert, Stefan (2005). Separating the wheat from the chaff: Corpus-driven evaluation of statistical association measures for collocation extraction. In B. Fisseni, H.-C. Schmitz, B. Schröder, and P. Wagner (eds.), Sprachtechnologie, mobile Kommunikation und linguistische Ressourcen. Beiträge zur GLDV-Tagung 2005 in Bonn, volume 8 of Computer Studies in Language and Speech, pages 104-117. Peter Lang, Frankfurt am Main. [PDF]

Lüdeling, Anke and Evert, Stefan (2005). The emergence of productive non-medical -itis. Corpus Evidence and qualitative analysis. In: Kepser, Stephan and Reis, Marga (eds.), Linguistic Evidence. Empirical, Theoretical, and Computational Perspectives, pages 351-370, Mouton de Gruyter, Berlin, New York. [manuscript (PDF)]

2004

Evert, Stefan (2004a). A simple LNRE model for random character sequences. In Proceedings of the 7èmes Journées Internationales d'Analyse Statistique des Données Textuelles, pages 411-422, Louvain-la-Neuve, Belgium. [PDF]

Evert, Stefan (2004b). The statistical analysis of morphosyntactic distributions. In Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC 2004), pages 1539-1542, Lisbon, Portugal. [PDF]

Evert, Stefan (2004c). Significance tests for the evaluation of ranking methods. In Proceedings of the 20th International Conference on Computational Linguistics (Coling 2004), pages 945-951, Geneva, Switzerland. [PDF]

Evert, Stefan; Heid, Ulrich; Spranger, Kristina (2004). Identifying morphosyntactic preferences in collocations. In Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC 2004), pages 907-910, Lisbon, Portugal. [PDF]

Evert, Stefan; Heid, Ulrich; Säuberlich, Bettina; Debus-Gregor, Esther; Scholze-Stubenrecht, Werner (2004). Supporting corpus-based dictionary updating. In Proceedings of the 11th Euralex International Congress, pages 255-264, Lorient, France. [PDF]

Krenn, Brigitte; Evert, Stefan; Zinsmeister, Heike (2004). Determining intercoder agreement for a collocation identification task. In Proceedings of KONVENS 2004, pages 89-96, Vienna, Austria. [PDF]

Lüdeling, Anke and Evert, Stefan (2004). The emergence of productive non-medical -itis: corpus evidence and qualitative analysis. In Proceedings of the First International Conference on Linguistic Evidence, pages 91-95, Tübingen, Germany. [PDF]

2003

Evert, Stefan and Kermes, Hannah (2003a). Experiments on candidate data for collocation extraction. In Companion Volume to the Proceedings of the 10th Conference of The European Chapter of the Association for Computational Linguistics, pages 83-86. [PDF]

Evert, Stefan and Kermes, Hannah (2003b). Annotation, storage, and retrieval of mildly recursive structures. In K. Simov and P. Osenova (eds.), Proceedings of the Workshop on Shallow Processing of Large Corpora (SProLaC 2003), pages 23-33, Lancaster, UK. [PDF]

Carletta, Jean; Kilgour, Jonathan; O'Donnell, Timothy; Evert, Stefan; Voormann, Holger (2003). The NITE object model library for handling structured linguistic annotation on multimodal data sets. In Proceedings of the EACL Workshop on Language Technology and the Semantic Web (3rd Workshop on NLP and XML, NLPXML-2003), pages 17-24, Budapest, Hungary. [PDF]

Kermes, Hannah and Evert, Stefan (2003). Text analysis meets corpus linguistics. In D. Archer, P. Rayson, A. Wilson, and T. McEnery (eds.), Proceedings of the Corpus Linguistics 2003 Conference, pages 402-411. UCREL. [PDF]

Lüdeling, Anke and Evert, Stefan (2003). Linguistic experience and productivity: corpus evidence for fine-grained distinctions. In D. Archer, P. Rayson, A. Wilson, and T. McEnery (eds.), Proceedings of the Corpus Linguistics 2003 Conference, pages 475-483. UCREL. [PDF]

2002

Kermes, Hannah and Evert, Stefan (2002). YAC - a recursive chunker for unrestricted German text. In M. G. Rodriguez and C. P. Araujo (eds.), Proceedings of the Third International Conference on Language Resources and Evaluation (LREC), volume V, pages 1805-1812, Las Palmas, Spain.

2001

Evert, Stefan and Krenn, Brigitte (2001). Methods for the qualitative evaluation of lexical association measures. In Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics, pages 188-195, Toulouse, France. [PDF, colour plots]

Evert, Stefan and Lüdeling, Anke (2001). Measuring morphological productivity: Is automatic preprocessing sufficient? In P. Rayson, A. Wilson, T. McEnery, A. Hardie, and S. Khoja (eds.), Proceedings of the Corpus Linguistics 2001 Conference, pages 167-175, Lancaster. UCREL. [PDF]

Kermes, Hannah and Evert, Stefan (2001). Exploiting large corpora: A circular process of partial syntactic analysis, corpus query and extraction of lexicographic information. In P. Rayson, A. Wilson, T. McEnery, A. Hardie, and S. Khoja (eds.), Proceedings of the Corpus Linguistics 2001 Conference, pages 332-340, Lancaster. UCREL. [PDF]

Krenn, Brigitte and Evert, Stefan (2001). Can we do better than frequency? A case study on extracting PP-verb collocations. In Proceedings of the ACL Workshop on Collocations, pages 39-46, Toulouse, France. [PDF, colour plots]

2000

Evert, Stefan; Heid, Ulrich; Lezius, Wolfgang (2000). Methoden zum Vergleich von Signifikanzmaßen zur Kollokationsidentifikation. In W. Zühlke and E. G. Schukat-Talamazzini (eds.), KONVENS-2000 Sprachkommunikation, pages 215-220. VDE-Verlag. [PDF]

Berman, Steve; Evert, Stefan; Heid, Ulrich (2000). Searchable metaspaces. In Proceedings of the EAGLES/ISLE Workshop on Metadata, Athens, Greece. [PDF]

Heid, Ulrich; Evert, Stefan; Docherty, Vincent; Worsch, Wolfgang; Wermke, Matthias (2000). A data collection for semi-automatic corpus-based updating of dictionaries. In U. Heid, S. Evert, E. Lehmann, and C. Rohrer (eds.), Proceedings of the 9th EURALEX International Congress, pages 183-195.

Lüdeling, Anke; Evert, Stefan; Heid, Ulrich (2000). On measuring morphological productivity. In W. Zühlke and E. G. Schukat-Talamazzini (eds.), KONVENS-2000 Sprachkommunikation, pages 57-61. VDE-Verlag. [PDF]

Edited Volumes

Adrian, Axel; Evert, Stephanie; Kohlhase, Michael; Zwickel, Martin (eds.) (2022). Digitalisierung von Zivilprozess und Rechtsdurchsetzung. Number 284 in Schriften zum Prozessrecht. Duncker & Humblot, Berlin.

Griebel, Tim; Evert, Stefan; Heinrich, Philipp (eds.) (2020). Multimodal Approaches to Media Discourses: Reconstructing the Age of Austerity in the United Kingdom. Routledge Studies in Multimodality. Routledge, Abingdon. [publisher homepage]

Rayson, Paul; Villada Moirón, Begoña; Sharoff, Serge; Piao, Scott; Evert, Stefan (eds.) (2010). Special issue on Multiword expressions: hard going or plain sailing? International Journal of Language Resources and Evaluation 44(1–2). [CfP, Springer Link]

Heid, Ulrich; Evert, Stefan; Lehmann, Egbert; Rohrer, Christian (eds.) (2000). Proceedings of the 9th EURALEX International Congress, Stuttgart, Germany.

Other Publications

Heinrich, Philipp; Dykes, Natalie; Evert, Stefan (2021). Annotator agreement in the anonymization of court decisions. Presentation at the Corpus Linguistics 2021 Conference, Limerick/online. [slides, video]

Diwersy, Sascha; Evert, Stefan; Heinrich, Philipp; Proisl, Thomas (2019). Means of productivity – On the statistical modelling of the restrictedness of lexico-grammatical patterns. Presentation at EUROPHRAS 2019: Productive Patterns in Phraseology, Santiago de Compostela, Spain. [slides]

Evert, Stefan and Heid, Ulrich (2019). Between collocation and construction: Lexical preferences in non-idiomatic word combinations. Presentation at EUROPHRAS 2019: Productive Patterns in Phraseology, Santiago de Compostela, Spain. [slides]

Evert, Stefan; Proisl, Thomas; Uhrig, Peter; Khokhlova, Maria (2018). Contrastive collocation analysis – A comparison of association measures across three different languages using dependency-parsed corpora. Presentation at the XVIIIth EURALEX International Congress, Ljubljana, Slovenia. [abstract, slides]

Evert, Stefan; Dykes, Natalie; Peters, Joachim (2018). A quantitative evaluation of keyword measures for corpus-based discourse analysis. Presentation at the Corpora & Discourse International Conference (CAD 2018), Lancaster, UK. [abstract, slides]

Proisl, Thomas and Evert, Stefan (2018). Delta vs. N-Gram-Tracing: Wie robust ist die Autorschaftsattribuierung? Poster at the DHd 2018 Conference, Cologne, Germany. [poster]

Evert, Stefan (2017a). Measures of productivity and lexical diversity. Poster at the ICAME 38 Conference, Prague, Czech Republic. [abstract, poster] – best poster award

Evert, Stefan (2017b). Making sense of multivariate analyses of linguistic variation. Poster at the Corpus Linguistics 2017 Conference, Birmingham, UK. [abstract, poster, additional material]

Evert, Stefan; Heinrich, Philipp; Schäfer, Fabian (2017). Social Bots in Japan's 2014 General Election: Preliminary Results from a Corpus-Linguistic and Qualitative Study of Computational Propaganda on Twitter. Presentation at the International Conference on Computational Social Science (IC2S2 2017), Cologne, Germany. [abstract]

Neumann, Stella; Evert, Stefan; De Sutter, Gert (2017). Register-specific interference in translation. Presentation at the Annual Meeting of the German Linguistics Association (DGfS 2017), Saarbrücken, Germany. [abstract, slides]

Evert, Stefan; Jannidis. Fotis; Proisl, Thomas; Vitt, Thorsten; Schöch, Christof; Pielström, Steffen; Reger, Isabella (2016). Outliers or Key Profiles? Understanding Distance Measures for Authorship Attribution. Presentation at Digital Humanities 2016, Kraków, Poland. [abstract]

Evert, Stefan; Schneider, Gerold; Brezina, Vaclav; Gries, Stefan Th.; Lijffijt, Jefrey; Rayson, Paul; Wallis, Sean; Hardie, Andrew (2015). Corpus statistics: key issues and controversies. Panel discussion at Corpus Linguistics 2015, Lancaster, UK. [abstract]

Evert, Stefan; Proisl, Thomas; Schöch, Christof; Jannidis, Fotis; Pielström, Steffen; Vitt, Thorsten (2015). Explaining Delta, or: How do distance measures for authorship attribution work? Presentation at Corpus Linguistics 2015, Lancaster, UK. [abstract, slides]

Bartsch, Sabine; Evert, Stefan; Proisl, Thomas; Uhrig, Peter (2015). (Association) measure for measure. Presentation at ICAME 36, Trier, Germany.

Lapesa, Gabriella; Schulte im Walde, Sabine; Evert, Stefan (2014). Judging Paradigmatic Relations: A Collection of Ratings for English. Poster at Architecture and Mechanisms of Language Processing (AMLAP-2014), Edinburgh, UK. [poster (PDF)]

Evert, Stefan and Neumann, Stella (2013). The impact of translation direction on the characteristics of translated texts: a multivariate analysis for English and German. Presentation at the 46th Annual Meeting of the Societas Linguistica Europaea, Split, Croatia.

Bartsch, Sabine and Evert, Stefan (2013b). Exploring the Firthian notion of collocation. Presentation at Corpus Linguistics 2013, Lancaster, UK.

Evert, Stefan; Schneider, Gerold; Lehmann, Hans Martin (2013). Statistical modelling of natural language for descriptive linguistics. Presentation at Corpus Linguistics 2013, Lancaster, UK.

Lapesa, Gabriella and Evert, Stefan (2013b). Thematic Roles and Semantic Space. Insights from Distributional Semantic Models. Presentation at Quantitative Investigations in Theoretical Linguistics (QITL-5), Leuven, Belgium. [abstract (PDF), slides (PDF)]

Lapesa, Gabriella and Evert, Stefan (2013a). Item-based Prediction of Reaction Times in Priming: an Evaluation of Distributional Semantic Models. Poster at Architecture and Mechanisms of Language Processing (AMLAP-2013), Marseille, France. [poster (PDF)]

Sánchez Marco, Cristina; Marín, Rafael; Evert, Stefan (2012). The lexical extension of estar + participle through psychological verbs. Presentation at the Annual Meeting of the German Linguistics Association (DGfS 2012), Frankfurt, Germany.

Sánchez Marco, Cristina; Marín, Rafael; Evert, Stefan (2012). Measuring lexical extension: The case of Spanish estar + past participle. Poster presentation at Linguistic Evidence 2012, Tübingen, Germany. [extended abstract (PDF)]

Evert, Stefan (2011). Quantitative measures of productivity and their significance. Presentation at Corpus Linguistics 2011, Birmingham, UK. [handout (PDF)]

Sánchez Marco, Cristina and Evert, Stefan (2011). Measuring semantic change: The case of Spanish participial constructions. Poster presentation at QITL-4, Berlin, Germany.

Evert, Stefan and Pipa, Gordon (2010). Probability estimation of rare events in linguistics and computational neuroscience. Presentation at KogWis 2010, Potsdam, Germany. [abstract, handout]

Pipa, Gordon and Evert, Stefan (2010). Statistical models of non-randomness in natural language. Presentation at KogWis 2010, Potsdam, Germany. [abstract (PDF)]

Evert, Stefan (2009). Rethinking corpus frequencies. Presentation at the ICAME 30 Conference, Lancaster, UK. [handout (PDF)]

Evert, Stefan (2007). Room for improvement? Upper limits on collocation extraction with statistical association measures. Poster presentation at the Computational Linguistics Poster Session at the Annual Meeting of the German Linguistics Association (DGfS 2007). [poster (PDF)]

Evert, Stefan and Baroni, Marco (2006). ZipfR: Working with words and other rare events in R. Presentation at the useR! 2006 Conference, Vienna, Austria. [handout (PDF)]

Lüdeling, Anke; Baroni, Marco; Evert, Stefan (2006). Need and competition in word formation and where to find data to study them. Poster presentation at the Second International Conference on Linguistic Evidence, Tübingen, Germany. [abstract (PDF), poster (PDF)]

Lüdeling, Anke; Baroni, Marco; Evert, Stefan (2006). Need and competition: Deconstructing quantitative productivity. Presentation at the Second Conference on Quantitative Investigations in Theoretical Linguistics (QITL-2), Osnabrück, Germany. [abstract (PDF)]

Evert, Stefan (2005). Empirical research on association measures: The UCS toolkit. Software demonstration at the Phraseology 2005 Conference, Louvain-la-Neuve, Belgium. [abstract (PDF)]

Evert, Stefan and Krenn, Brigitte (2005). Exploratory collocation extraction. Presentation at the Phraseology 2005 Conference, Louvain-la-Neuve, Belgium. [abstract (PDF)]

Evert, Stefan and Hoffmann, Sebastian (2005). BNCweb (CQP edition): The marriage of two corpus tools. Presentation at the Corpus Linguistics 2005 Conference, Birmingham, UK.

Evert, Stefan (2004d). An on-line repository of association measures. http://www.collocations.de/AM/.

Evert, Stefan; Carletta, Jean; O'Donnell, Timothy J.; Kilgour, Jonathan; Vögele, Andreas; Voormann, Holger (2003). The NXT object model. Technical report, IMS, University of Stuttgart. Version 2.1. [PDF]

Evert, Stefan and Voormann, Holger (2003). NQL - a query language for multi-modal language data. Technical report, IMS, University of Stuttgart. Version 2.1. [PDF]

Evert, Stefan and Kermes, Hannah (2002). The influence of linguistic pre-processing on candidate data. In Workshop on Computational Approaches to Collocations, Vienna, Austria.

Schönenberger, Manuela and Evert, Stefan (2002). The benefit of doubt. Presentation at the Workshop on Quantitative Investigations in Theoretical Linguistics (QITL), Osnabrück, Germany, October 2002. Slides can be downloaded from http://www.cogsci.uni-osnabrueck.de/qitl/.

Evert, Stefan (1999). Das Verhalten von Lösungen der vektoriellen Helmholtzgleichung in Außenräumen für kleine Frequenzen unter elektrischen Randbedingungen. Unpublished Diplom thesis, University of Stuttgart. [PDF]