Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, 22-24 May 2017, Gothenburg, Sweden
Noushin Rezapour Asheghi, Serge Sharoff, and Katja Markert. 2016. Crowdsourcing for web genre annotation. Language Resources and Evaluation, 50(3):603–641.
Douglas Biber and Jesse Egbert. 2015. Using grammatical features for automatic register identification in an unrestricted corpus of documents from the Open Web. Journal of Research Design and Statistics in Linguistics and Communication Science, 2(1).
Douglas Biber, S. Johansson, G. Leech, Susan Conrad, and E. Finegan. 1999. The Longman Grammar of Spoken and Written English. Longman, London. Douglas Biber, Jesse Egbert, and Mark Davies. 2015. Exloring the composition of the searchable web: a corpus-based taxonomy of web registers. Corpora, 10(1):11–45.
Douglas Biber. 1989. Variation across speech and writing. Cambridge University Press, Cambridge.
Douglas Biber. 1995. Dimensions of Register Variation: A Cross-linguistic Comparison. Cambridge University Press, Cambridge.
Bernd Bohnet. 2010. Very high accuracy and fast dependency parsing is not a contradiction. In Proceedings of the 23rd International Conference on Computational Linguistics, COLING ’10, pages 89–97, Stroudsburg, PA, USA. Association for Computational Linguistics.
Pedro. Carpena, Pedro. Bernaola-Galván, Michael Hackenberg, Ana. V. Coronado, and Jose L. Oliver. 2009. Level statistics of words: Finding keywords in literary texts and symbolic sequences. Physical Review E (Statistical, Nonlinear, and Soft Matter Physics), 79(3):035102.
Kevin Crowston, Barbara Kwasnik, and Joseph Rubleske, 2011. Genres on the Web: Computational Models and Empirical Studies, chapter Problems in the Use-Centered Development of a Taxonomy of Web Genres, pages 69–84. Springer Netherlands, Dordrecht.
Jesse Egbert, Douglas Biber, and Mark Davies. 2015. Developing a bottom-up, user-based method of web register classification. Journal of the Association for Information Science and Technology, 66(9):1817–1831.
S. Meyer Zu Essen and Barbara Stein. 2004. Genre classification of web pages: User study and feasibility analysis. Proceedings of the 27th Annual German Conference on Artificial Intelligence, pages 256–259.
Eugenie Giesbrecht and Stefan Evert. 2009. Is part-of speech tagging a solved task? an evaluation of postaggers for the german web as corpus. In Web as Corpus Workshop (WAC5), pages 27–36.
Stephan Gries, John Newman, and Cyrus Shaoul. 2011. N-grams and the clustering of registers. Empirical Language Research.
Stephan Gries, 2012. Methodological and analytic frontiers in lexical research, chapter Behavioral Profiles: a fine-grained and quantitative approach in corpus-based lexical semantics. John Benjamins, Amsterdam and Philadelphia.
Isabelle M. Guyon and Andre Elisseeff. 2003. An introduction to variable and feature selection. The journal of machine learning research, 3:1157–1182.
Jenna Kanerva, Matti Luotolahti, Veronika Laippala, and Filip Ginter. 2014. Syntactic n-gram collection from a large-scale corpus of Internet Finnish. In Proceedings of the Sixth International Conference Baltic HLT 2014, pages 184–191. IOS Press.
Adam Kilgariff and Gregory Grefenstette. 2003. Introduction to the special issue on Web as Corpus. Computational Linguistics, 29(3).
Christoph Lindemann and Lars Littig, 2011. Genres on the Web: Computational Models and Empirical Studies, chapter Classification of Web Sites at Super-genre Level, pages 211–235. Springer Netherlands, Dordrecht.
Juhani Luotolahti, Jenna Kanerva, Veronika Laippala, Sampo Pyysalo, and Filip Ginter. 2015. Towards universal web parsebanks. In Proceedings of the International Conference on Dependency Linguistics (Depling’15), pages 211–220. Uppsala University.
C.R. Miller. 1984. Genre as social action. Quaterly journal of speech, 70(2):151–167.
Marina Santini and Serge Sharoff. 2009. Web genre benchmark under construction. JLCL, 24(1):129–145.
Roland Schäfer and Felix Bildhauer, 2016. Proceedings of the 10th Web as Corpus Workshop, chapter Automatic Classification by Topic Domain for Meta Data Generation, Web Corpus Evaluation, and Corpus Comparison, pages 1–6. Association for Computational Linguistics.
Mike Scott and Chistopher Tribble. 2006. Textual Patterns: keyword and corpus analysis in language education. Benjamins, Amsterdam.
Serge Sharoff, ZhiliWu, and Katja Markert. 2010. The web library of Babel: evaluating genre collections.
John Sinclair. 1996. Preliminary recommendations on corpus typology.
John Swales. 1990. Genre analysis: English in academic and research settings. Cambridge University Press, Cambridge.
Vedrana Vidulin, Mitja Lustrek, and Matjax Gams. 2007. Using genres to improve search engines. In Workshop ”Towards genre-enabled Search Engines: The impact of NLP” at RANLP, pages 45–51.
Bonnie Webber. 2009. Genre distinctions for discourse in the Penn treebank. In Proceedings of the joint conference of the 47th annual meeting of the ACL and the 4th international joint conference on natural language processing of the AFNLP., pages 674–682. Association for Computational Linguistics.