The language of mathematics computational, linguistic and logical ...

Universitätsstraße 12. 45117 Essen. E-Mail: [email protected]. Herausgegeben durch den. Universitätsverlag Rhein-Ruhr. Redaktion. Paschacker 77.
7MB Größe 4 Downloads 772 Ansichten
SDV

SDV (Sprache und Datenverarbeitung) ISSN 0343-5202 ISBN 978-3-942158-85-5

International Journal for Language Data Processing

Sprache und Datenverarbeitung

Universitätsverlag Rhein-Ruhr

SDV (Sprache und Datenverarbeitung) 38.1-2/2014

9 783942 158855

The language of mathematics computational, linguistic and logical aspects Herausgeber: Marcos Cramer & Bernhard Schröder

SDV Vol. 38.1-2/2014

Sprache und Datenverarbeitung International Journal for Language Data Processing 38. Jahrgang 2014

Heft 1-2 (2014) Begründet durch Winfried Lenders und Harald Zimmermann Herausgegeben durch den Universitätsverlag Rhein-Ruhr Redaktion Paschacker 77 47228 Duisburg von: Hermann Cölfen, Essen Ulrich Schmitz, Essen Bernhard Schröder, Essen

Schriftleitung: Ulrich Schmitz Universität Duisburg-Essen Fakultät für Geisteswissenschaften Universitätsstraße 12 45117 Essen E-Mail: [email protected] Layout: UVRR, Duisburg Titelillustration: Michael Hüter, Bochum

Sprache und Datenverarbeitung im Internet: http://www.linse.uni-due.de/sdv.html

Mitteilung der Herausgeber Manuskripte sind an die Schriftleitung zu richten. Für die Einreichung von Manuskripten ist unbedingt das Merkblatt zu beachten, das bei der Schriftleitung angefordert werden kann und im Web als PDF-Datei zur Verfügung steht. Die Zeitschrift zahlt kein Honorar. Die Autoren erhalten ein Heft kostenlos, davon ausgenommen sind Rezensionen und Kurzberichte. Für die hier veröffentlichten Beiträge hat § 4 des UrhRg Gültigkeit. Unver­langt eingereichte Beiträge

werden nur nach Aufforderung (unter Beifügung von Rückporto) zurückgesandt. Rezensionsexemplare werden an die Adresse der Schriftleitung erbeten. Nach Erscheinen der Rezension erhält der betreffende Verlag einen Beleg von der Schriftleitung. Die Zeitschrift erscheint jährlich in zwei Heften. Ab 1.1.2002 gültiger Preis (jeweils zzgl. Porto) für ein Jahresabonnement (2 Hefte): 43,50 Euro

Bestellungen sind zu richten an das: AZN – Auslieferungszentrum Niederrhein der Butzon & Bercker GmbH Hoogeweg 100, 47623 Kevelaer E-Mail: [email protected] Anzeigen: Gültig ist Anzeigenpreisliste 8.

© Die Herausgeber Alle Rechte vorbehalten. Printed in Germany.

ISSN 0343-5202 ISBN 978-3-95605-023-7 (Print) ISBN 978-3-95605-024-4 (E-Book)

UVRR Universitätsverlag Rhein-Ruhr

Bibliografische Information der Deutschen Nationalbibliothek:

Die Deutsche Nationalbibliothek verzeichnet diese Publikation in der Deutschen Nationalbibliografie; detaillierte bibliografische Daten sind im Internet über http://dnb.d-nb.de abrufbar. Das Werk einschließlich aller seiner Teile ist urheberrechtlich geschützt. Jede Verwertung außerhalb der engen Grenzen des Urheberrechtsgesetzes ist ohne Zustimmung des Verlags unzulässig und strafbar. Das gilt insbesondere für Vervielfältigungen, Übersetzungen, Mikroverfilmungen und die Einspeicherung und Verarbeitung in elektronischen Systemen.

Inhalt Marcos Cramer & Bernhard Schröder Foreword of the special issue of SDV on “The language of mathematics computational, linguistic and logical aspects” ........................................... 5-7 Marcos Cramer The Naproche system: Proof-checking mathematical texts in controlled natural language .................................................................... 9-33 Jordi Saludes & Sebastian Xambó Multilingual Tools for Mathematics ........................................................... 35-48 Michael Kohlhase & Mihnea Iancu Co-Representing Structure and Meaning of Mathematical Documents ..... 49-80 Michael Richter & Roeland van Hout A classification of German verbs using empirical language data and concepts of Vendler and Dowty ........................................................... 81-117

Adressen ..................................................................................................................... 119

Marcos Cramer & Bernhard Schröder

Foreword of the special issue of SDV on “The language of mathematics computational, linguistic and logical aspects” In recent years the specialized language of mathematics has become a topic of increased interest among linguists, logicians and computer scientists. This special issue of SDV provides insight into this interdisciplinary field of study through three papers that approach the study of mathematical language from different angles. The language of mathematics is the specialized language that mathematicians and mathematics educators use in order to express and explain mathematical ideas and formulate mathematical conjectures, theorems and proofs. It incorporates the syntax and semantics of the general natural language, but distinguishes itself from common language by a number of special features, e.g. by the combination of natural language expressions with mathematical symbols and formulas, by the use of variables instead of anaphoric pronouns, by the use of conventional sentence structures that help to avoid ambiguities, by explicit ways of marking text structure, and by its adaptiveness through definitions that add new symbols and expressions to the vocabulary and fix their meaning. In recent years, the field of computer mathematics has gained traction with advances in interactive theorem provers facilitating the computer formalization of ever more mathematics. The completion of the formalization of the Kepler conjecture by the Flyspeck project in 2014 has shown that the tools are ready for formalizing and formally verifying mathematics up to the level of current research. However, there is still a vast gap between the way mathematicians produce and present mathematics using the specialized language of mathematics, and the way mathematics is formalized in interactive theorem provers like HOL Light, Isabelle and Coq, which employ formal languages more similar in nature to programming languages than to the specialized language of mathematics. The narrowing of this gap requires a combined effort of computer scientists, linguists and logicians studying the language of mathematics and modelling it from a computational perspective. For linguists, an additional motivation for studying the language of mathematics is that it has some features, e.g. a relatively well-understood semantics, that make it an interesting and fruitful test-bed for linguistic theories. Additionally, the fact that mathematicians usually avoid ambiguities by conventional sentence structures makes the language a more feasible object of study for rulebased computational linguistics. For logicians, the study of the language of mathematics is a fruitful endeavour, as it requires new logical approaches that combine logical methods developed for the study of natural language with approaches from mathematical logic. Sprache und Datenverarbeitung 1-2 (2014): S. 5-7

6

Sprache und Datenverarbeitung 1-2 (2014)

All three papers in this special issue approach the study of the language of mathematics from an interdisciplinary perspective, even though the focus is quite different in each of these papers. The paper The Naproche system: Proof-checking mathematical texts in controlled natural language by Marcos Cramer gives an overview over the linguistic and logical techniques developed for the Naproche system, a system for linguistically analyzing and formally verifying mathematical texts written in a controlled natural language, i.e. a subset of the language of mathematics defined through a formal grammar. The usage of a controlled natural language as input language makes the Naproche system a bridge between interactive theorem provers like HOL Light, Isabelle and Coq on the one hand and the natural language of mathematics on the other hand. Apart from giving a general overview over the achievements of the Naproche project, the paper proposes a higherorder extension to Dynamic Predicate Logic, a logic that formalizes the dynamic nature of natural language quantification. The proposed extension is shown to be suitable to formalize both the usage of definitions for dynamically extending the language of mathematics, and the phenomenon of implicit function introduction in mathematical texts, exemplified by constructs of the form “for every x there is an f(x) such that … “. The paper Multilingual Tools for Mathematics by Jordi Saludes and Sebastià Xambó presents and discusses the Mathematical Grammar Library (MGL), a system that enables rigorous multilingual machine processing of mathematical texts. MGL is coded in Grammatical Framework, a programming language for multilingual grammar applications. The paper describes the MGL architecture and presents the evaluation of the mOlto project, within which MGL was developed. Furthermore, the paper discusses how MGL was used to build a prototype of a multilingual dialogue system capable of helping students to solve word problems, i.e. mathematical problems that require the student to figure out the right equations to describe a real world situation. The paper Co-Representing Structure and Meaning of Mathematical Documents by Michael Kohlhase and Mihnea Iancu analyses various phenomena of mathematical language and deduces from them requirements for a representation format for mathematical documents (and other STEM documents). The analysed phenomena are divided in three levels of granularity: phrase structure, discourse structure and context/document structure. Apart from the requirement to fulfil the discussed phenomena of the language of mathematics, the authors point to three further requirements on a representation format for mathematical documents, “flexiformality” (a term introduced by the first author in a previous publication), logical pluralism and semantic underspecification. The paper shows that the OMDoc format meets most of the discussed requirements and is thus a suitable format for representing challenging linguistic/semantic phenomena of the language of mathematics. Marcos Cramer

Marcos Cramer & Bernhard Schröder: Foreword

7

In addition to these papers related to the main topic this double issue contains a submitted paper by Michael Richter and Roeland van Hout on a quite different topic. They examine how well verb classifications based on speakers’ judgement and usage based data fit classifications proposed by Vendler (1967), Dowty (1991), and Richter and van der Hout (2010) in order to provide for an empirically grounded classification. Bernhard Schröder

Marcos Cramer

The Naproche system: Proof-checking mathematical texts in controlled natural language Abstract The Naproche system is a system for linguistically analyzing and proof-checking mathematical texts written in a controlled natural language, i.e. a subset of the usual natural language of mathematical texts defined through a formal grammar. This paper gives an overview over the linguistic and logical techniques developed for the Naproche system. Special attention is given to the dynamic nature of quantification in natural language, to the phenomenon of implicit function introduction in mathematical texts, and to the usage of definitions for dynamically extending the language of a mathematical text.

1 Introduction The language of mathematics, i.e. the special language that is used in mathematical journals and textbooks, has some unique linguistic features, on the syntactic, on the semantic and on the pragmatic level: For example, on the syntactic level, it can incorporate complex symbolic material into natural language sentences. On the semantic level, it refers to rigorously defined abstract objects, and is in general less open to ambiguity than most other text types. On the pragmatic level, it reverses the expectation on assertions, which have to be implied by the context rather than adding new information to it. The work presented in this paper has been conducted in the context of the Naproche project, an interdisciplinary project between mathematical logic and computational and formal linguistics at the universities of Bonn and Duisburg-Essen (see [6] and section 1.4 of [11]). The Naproche project is guided by the vision of a computer program that could check the correctness of mathematical proofs written in the natural language of mathematics. Given that reliable processing of unrestricted natural language input is out of the reach of current technology, the project focuses on the attainable goal of using a controlled natural language (CNL), i.e. a subset of a natural language defined through a formal grammar, as input language to such a program. We have developed a prototype of such a computer program, the Naproche system. This paper presents the linguistic and logical theoretical framework of the Naproche system and its CNL. The main application that we have in mind for the Naproche system is to make formal mathematics more natural and hence more accessible to the average mathematician. Formal mathematics is a branch of mathematics that aims at developing substantive parts of mathematics in a purely formal way. This is usually done with the help of Sprache und Datenverarbeitung 1-2 (2014): S. 9-33

10

Sprache und Datenverarbeitung 1-2 (2014)

computer programs, and the usual input language of formal mathematics systems have more resemblance with programming languages than with the natural language of mathematics. We think that this is one of the reasons why formal mathematics is not widely used by mathematicians outside the circles of the relatively small formal mathematics community. We hope to close the gap between the formal mathematics community and the rest of the mathematical community by developing a formal mathematics system that allows for a much more natural input language. Before running the proof-checking algorithm for checking the logical correctness of a given input text, the Naproche system transforms the input text into a logical representation of its content, called a Proof Representation Structure (PRS). PRSs are an extension of Discourse Representation Structures (DRSs), enriched in such a way as to represent some special characteristics of the language of mathematics [6]. In this paper, as in [11], we use Dynamic Predicate Logic (DPL) and its extension Proof Text Logic (PTL) instead of DRSs and PRSs as the basis for the theoretical exposition. Just like Discourse Representation Theory, Dynamic Predicate Logic is a formal system aimed at capturing the dynamic nature of natural language quantification. But unlike Discourse Representation Theory, it has a close syntactical resemblance to standard systems of first-order predicate logic. The logical and linguistic theory developed in the context of the Naproche project has been presented in detail in the author’s thesis [11]. This paper gives an overview over this theory, with a special focus on the phenomenon of implicit function introduction in mathematical texts, exemplified by constructs of the form “for every x there is an f (x) such that…”, on the usage of definitions for dynamically extending the language of a mathematical text, and on quantifiers in bi-implications and reversed implications that have to be treated in a similar way as in donkey sentences. Section 2 gives an overview over the language of mathematics, with a special focus on the features of it which will be relevant for the discussion in this paper. Section 3 introduces and formally defines Dynamic Predicate Logic (DPL). In section 4, we give an overview over the controlled natural language implemented in the Naproche system. In section 5, we discuss some difficulties in parsing the symbolic expressions found in mathematical texts and sketch the solution to these problems that is implemented in the Naproche system. Section 6 discusses the phenomenon of implicit dynamic function introduction, and section 7 defines an extension of DPL called Typed Higher-Order Dynamic Predicate Logic (THODPL) which formalizes this phenomenon. In section 8, we explain the proof checking algorithm implemented in the Naproche system. In section 9, we discuss how definitions are used to dynamically expand the language of mathematics, and how this is implemented in Naproche using the methods provided by THODPL. Section 10 discusses the interpretation of quantifiers in bi-implications and reversed implications that have to be treated in a similar way as in donkey sentences. Section 11 presents related work and section 12 concludes the paper.