Παρουσίαση/Προβολή
Βασικές Αρχές Γλωσσολογικής ανάλυσης
(M903) - Στέλλα Μαρκαντωνάτου, Γιώργος Μαρκόπουλος, Αλέξανδρος Τάντος
Περιγραφή Μαθήματος
Dear students,
We welcome you to M903 ‘Principles of linguistic analysis’, which is one of the 8 required courses of the MSc ‘Language Technologies’ (MSc LT from now on).
M903 has been designed to discuss the “texture”, so to say, of language, which is the subject matter of MSc LT. We adopted the established way of curving up the linguistic continuum into fields of study: phonemics, phonology, morphology, syntax, semantics and pragmatics and focused on the ones indicated with nice amber characters. Our aim is to offer a “bird’s eye view” of two corner-stone issues in Language Technologies:
- (Some of) the main questions asked in the study of these three fields that are well known problems for LT.
- The representation of linguistic facts for Natural Language Processing (NLP) purposes.
At the same time, the development of ideas in the field will also be explored and some technical skills will be developed.
The course covers a vast area of scientific knowledge. We thought that people with dedicated training would make the presentation of the topics more interesting to you. So, here we are in the order that we will deliver the lectures:
- Γιώργος Μαρκόπουλος, Επίκουρος καθηγητής Γλωσσολογίας (υπό διορισμό), Τμήμα Μεσογειακών Σπουδών, Πανεπιστήμιο Αιγαίου
Giorgos Markopoulos, Assistant Professor in Linguistics (official appointment pending), Department of Mediterranean Studies, University of the Aegean
Giorgos Markopoulos will talk about morphology, which studies the internal structure of words.
Temporary personal page: https://www.researchgate.net/profile/Giorgos_Markopoulos
Email: g.markopoulos@aegean.gr
- Στέλλα Μαρκαντωνάτου, Υπολογιστική Γλωσσολόγος, Ερευνήτρια Α΄, Ινστιτούτο Επεξεργασίας του Λόγου/Ε.Κ. «Αθηνά» http://www.ilsp.gr/el/profile/staff?view=member&id=38&task=show
Stella Markantonatou, Computational Linguist, Research Director, Institute for Language and Speech Processing/’Athena’ RC http://www.ilsp.gr/en/profile/staff?view=member&id=38&task=show
Stella Markantonatou will present the main issues in syntax, which studies the structure of linguistic strings as relations between words or constellations of words.
Email: marks@athenarc.gr, stiliani.markantonatou@gmail.com (checked more frequently)
- Αλέξανδρος Τάντος, Επίκουρος καθηγητής Κειμενογλωσσολογίας και Γλωσσολογίας Σωμάτων Κειμένων, Τμήμα Φιλολογίας, Φιλοσοφική Σχολή, Αριστοτέλειο Πανεπιστήμιο Θεσσαλονίκης https://www.lit.auth.gr/node/1559
Alexandros Tantos, Assistant Professor in Text and Computational Linguistics, Department of LInguistics, School of Philosophy, Aristotle University of Thessaloniki https://www.lit.auth.gr/atantos/en/
Alexandros Tantos will focus on the main topics of the three levels of formal semantic analysis: lexical semantics, sentential and discourse semantics
Email: alextantos@lit.auth.gr
Stella Markantonatou has the overall responsibility of the course. It goes without saying that all of us welcome your creative questions and inquiries.
The course comprises 13 lectures. Of them, L13 will be devoted to the discussion of the overall experience, to solving the last questions and to tying up loose ends. L1 & L2 will be delivered by George, L3-L9 by Stella and L10-L12 by Alexandros. All the lectures are available at the M903 space on the platform with the caveat that their authors may change their content during the course. Αt the end of this document, we append the main references used in the course. Of course, lectures may offer additional bibliography regarding particular topics. We have tried to make sure that the references are freely accessible.
At this point we would like to add a piece of advice: do pay attention to the course. Your community includes graduate students from a variety of disciplines while linguists occupy a mighty 50% of it. Neither the linguists nor the students of other disciplines are trained in formal linguistics as it is viewed from the point of NLP nor they have the relevant practical skills: M903 aims at providing you with exactly those skills and this knowledge. Your presence in the course will be checked daily (it is a required course, as a matter of fact). Do work in groups where each one will contribute good humor, a strong will for learning (we consider these as given), useful ideas and hands-on work.
The course will make heavy use of software that has been developed either for didactic purposes or for normal NLP use. You will be asked to use the software both to follow the lectures and to deliver work on the basis of which your participation to M903 will be evaluated. Instructions for the installation of the software are available on the course’s page. You are asked to install the software in the following order: the LFG-parser by the second week of the course and the MWE Toolkit by the eighth week of the course. You will use the LFG-parser right from the 3rd lecture on and the MWE Toolkit in L8 & L9. If any problems occur, please, be in touch with Stella Markantonatou.
Evaluation of your participation to M903:
- An exercise will be given to you at week 12 to evaluate your understanding of the material delivered with the lectures. Contribution to your overall mark 50%. It will consist of four or five topics distributed among the three main components of the course.
- Contribution to your overall mark 50%. You will be asked to choose between (2a) and (2b) right at the beginning of the course
2a. A research activity for two or three people. Topics: (i) Development of specifications for Greek UD (supervised by Stella Markantonatou and George Markopoulos mainly) (ii) Use of the MWE Toolkit for discovering idioms in twitter and literature corpora (iii) the study of gradation in verb idioms with empirical techniques. The research activity will allow you to be creative and, of course, get a deeper understanding of the issues and the available techniques for dealing with them. It will be delivered two months after the end of M903.
2b. Presentation of a scientific paper that contributes specialised but necessary information regarding the topics discussed in the course. Contribution to your overall mark 50%. It will be delivered during the course.
More information about (1) and (2) will be provided in L1.
In L13 we will ask you to provide us with feedback about M903. This is not a trivial procedure. We will take your feedback seriously in planning M903 2021 and, if necessary, MSc LT 2021. Your sincere and objective contribution will be invaluable. Please, do express both communal views (formed after a meeting) and personal ones.
We welcome you to M903. We do hope that we will share with you our excitement for MSc LT and M903 and their content.
Καλωσορίσατε στον κόσμο της Γλωσσικής Τεχνολογίας
Στέλλα, Αλέξανδρος, Γιώργος
You are welcome to the world of Language Technology
Stella, Alexandros, Giorgos
ΕΛΛΗΝΙΚΗ BIBΛΙΟΓΡΑΦΙΑ
Ζάγουρα, Αγγελική. 2019. Σχεδιασμός και ανάπτυξη διαδικτυακού γλωσσαρίου όρων γλωσσολογίας. Μεταπτυχιακή εργασία (MA dissertation). Πανεπιστήμιο Πατρών https://nemertes.lis.upatras.gr/jspui/handle/10889/12280
Τάντος, Αλέξανδρος, Μαρκαντωνάτου Στέλλα, Αναστασιάδη-Συμεωνίδη Άννα, Κυριακοπούλου, Παναγιώτα. 2015. Υπολογιστική γλωσσολογία. [ηλεκτρ. βιβλ.] Αθήνα:Σύνδεσμος Ελληνικών Ακαδημαϊκών Βιβλιοθηκών. Διαθέσιμο στο: http://hdl.handle.net/11419/2205 https://repository.kallipos.gr/handle/11419/2205
BIBLIOGRAPHY (IN ENGLISH)
Andrews, Avery D. 2007. The major functions of the noun phrase. In Timothy Shopen (Ed.), Language Typology and Syntactic Description (pp. 132-223). Cambridge: Cambridge University Press. doi:10.1017/CBO9780511619427.003
Asudeh, Ash & Toivonen, Ιda. 2009. Lexical-Functional Grammar. In Bernd Heine and Heiko Narrog, (Ed.) The Oxford Handbook of Linguistic Analysis. Oxford: Oxford University Press. http://users.ox.ac.uk/~cpgl0036/pdf/asudeh-toivonen09-lfg-ohla.pDF
Bender, Emily M. 2013. Linguistic Fundamentals for Natural Language Processing: 100 Essentials from Morphology and Syntax. Synthesis Lectures on Human Language Technologies #20. Morgan & Claypool Publishers. http://libgen.rs/search.php?req=Linguistic%20fundamentals%20for%20&lg_topic=libgen&open=0&view=simple&res=25&phrase=1&column=title&fbclid=IwAR3ADhTlzy_cLeMn-HxCeX6FQf3jhhssbWY-PJlHpdaaRlbP3LFE-7sBg4g
Jurafsky, Dan and James H. Martin. 2020. Speech and Language Processing https://web.stanford.edu/~jurafsky/slp3/
Levin, Beth. 1993. English Verb Class and Alternations: A Preliminary Investigation. Chicago. University of Chicago Press.
Osborne, Timothy and Kim Gerdes. 2019. The status of function words in dependency grammar: A critique of Universal Dependencies (UD). Glossa: a journal of general linguistics 4(1): 17. 1–28, DOI: https://doi.org/10.5334/gjgl.537
Przepiorkowski, Adam and Agnieszka Patejuk. 2018. Arguments and Adjuncts in Universal Dependencies. Proceedings of the 27th International Conference on Computational Linguistics, Santa Fe, New Mexico, USA, pp. 3837—3852, https://www.aclweb.org/anthology/C18-1324
WEB RESOURCES
ILSP NLP Web Services http://nlp.ilsp.gr/soaplab2-axis/
Online glossary of linguistics terminology: Greek-English & English-Greek
http://users.uoi.gr/gjxydo/lexicon/glossary.html
The Parole Tagset with examples http://nlp.ilsp.gr/nlp/tagset_examples/tagset_en/index.html
Universal Dependencies https://universaldependencies.org/u/overview/tokenization.html
SOFTWARE TO BE USED IN M903
LFG-Parser http://ioperm.org/lfg-parser.html
Ιf you have Java installed in your computer, you just go to the site and follow the instructions. Otherwise, first install Java (here is a link for Windows https://www.java.com/en/download/faq/java_win64bit.xml) and then try to install the LFG-Parser.
Before coming to L3, also download Sample Grammars, uzip it and install it at the same place where you have installed the parser. Start LFG-Parser and select from the folder greek-cfg the grammar 01.cfg.ambig.gr and run it for the sentence o Γιάννης είδε τον άντρα με το τηλεσκόπιο. In order to get parses you should:
- use the Parse button of the LFG parser
- copy on the top empty slot of the “new parse window” the string exactly as it is in the grammar without the hash (#) symbol(s) (they are used to comment out the string)
Mwetoolkit
Instructions for Installing the program
For Windows OS install:
https://www.microsoft.com/en-us/p/ubuntu/9nblggh4msv6?activetab=pivot:overviewtab
If there are problems during installation:
Enable ubuntu:
- Go to: Settings > Update & Security > For Developers > Developer Mode.
- On the Search menu search for: Turn Windows Features On or Off
- A window will open. Tick the box next to the option that reads: “Windows Subsystem for Linux (Beta)
- Restart
- On the search menu search for Ubuntu and launch
Mwetoolkit Installation
- git clone "https://gitlab.com/mwetoolkit/mwetoolkit3.git"
- cd mwetoolkit3
- make
- cd test
- ./testAll.sh
- cd ..
Ημερομηνία δημιουργίας
Τετάρτη 9 Σεπτεμβρίου 2020
-
Δεν υπάρχει περίγραμμα