GT ALGA Annual Meeting 2019

October 10-11, 2019 - Paris (DI ENS)


Antoine Amarilli

Query evaluation on probabilistic data: New hard cases

Abstract: This talk studies the problem of query evaluation over probabilistic data (aka probabilistic query evaluation or PQE). We fix a query that describes a pattern of interest in the data, and we are given as input a uncertain relational database where every fact is annotated with an independent probability of existence. The PQE problem asks for the probability, according to the database, that the pattern of interest exists. The PQE problem can also be phrased in terms of probabilistic graphs, where we ask what is the probability of subgraphs where some fixed pattern can be found. The complexity of PQE has been studied for unions of conjunctive queries (UCQ), which intuitively ask if some fixed patterns have a homomorphism to the target graph. In this context, a dichotomy has been shown in 2012 by Dalvi and Suciu: PQE is either in PTIME or is intractable (#P-hard) depending on the UCQ. In this talk, we will present two recent results showing more hard cases for PQE. The first result (joint work with İsmail İlkan Ceylan) deals with arity-two signatures, where it extends the dichotomy of Dalvi and Suciu to arbitrary queries closed under homomorphisms, e.g., regular path queries and Datalog. We show that PQE is #P-hard for any such query, unless it is equivalent to a UCQ and covered by the existing dichotomy. The second one (joint work with Benny Kimelfeld) studies conjunctive queries without self-joins, a special case of UCQs, and shows that queries with a #P-hard PQE problem are already hard for the unweighted version of PQE where all facts have probability 1/2. Equivalently, it is already intractable to count how many subinstances contain the pattern.

Florent Capelli

Proof systems for #SAT

Abstract: In this talk, we will show how one can augment classes of Boolean circuits used in knowledge compilation so that one can efficiently check that they are equivalent to a given CNF formula. We will show that it can be applied to define Cook-Reckhow proof systems for #SAT and maxSAT and how one can use this idea to output certificates from existing tools solving #SAT.

Arnaud Durand

Recursion schemes, discrete differential equations and characterization of polynomial time computations

Abstract:

Léo Exibard

Synthesis of Data Word Transducers

Abstract: In reactive synthesis, the goal is to automatically generate an implementation from a specification of the reactive and non-terminating input/output behaviours of a system. Specifications are usually modelled as logical formulae or automata over infinite sequences of signals (ω-words), while implementations are represented as transducers. In the classical setting, the set of signals is assumed to be finite. In this paper, we do not make such an assumption and consider data ω-words instead, i.e., words over an infinite alphabet. In this context, we study specifications and implementations respectively given as automata and transducers extended with a finite set of registers. We consider different instances, depending on whether the specification is nondeterministic, universal or deterministic, and depending on whether the number of registers of the implementation is given (bounded synthesis) or not. In the unbounded setting, we show undecidability for both universal and non-deterministic specifications, while decidability is recovered in the deterministic case. In the bounded setting, undecidability still holds for non-deterministic specifications, but can be recovered by disallowing tests over input data. The generic technique we use to show the latter result allows us to reprove some known result, namely decidability of bounded synthesis for universal specifications. (This contribution has been accepted and will be presented at CONCUR'19)

Amélie Gheerbrant

Best Answers over Incomplete Data : Complexity and First-Order Rewritings

Abstract: Answering queries over incomplete data is ubiquitous in data management and in many AI applications that use query rewriting to take advantage of relational database technology. In these scenarios one lacks full information on the data but queries still need to be answered with certainty. The certainty aspect often makes query answering unfeasible except for restricted classes, such as unions of conjunctive queries. In addition often there are no, or very few certain answers, thus expensive computation is in vain. Therefore we study a relaxation of certain answers called best answers. They are defined as those answers for which there is no better one (that is, no answer true in more possible worlds). When certain answers exist the two notions coincide. We compare different ways of casting query answering as a decision problem and characterise its complexity for first-order queries, showing significant differences in the behavior of best and certain answers.We then restrict attention to best answers for unions of conjunctive queries and produce a practical algorithm for finding them based on query rewriting techniques.

François Goasdoué

Query answering in ontological databases

Abstract:

Théodore Lopez

Determinisation of Finitely-Ambiguous Copyless Cost Register Automata

Abstract: Cost register automata (CRA) are machines reading an input word while computing values using write-only registers: values from registers are combined using the two operations, as well as the constants, of a semiring. Particularly interesting is the subclass of copyless CRAs where the content of a register cannot be used twice for updating the registers. Originally deterministic, non-deterministic variant of CRA may also be defined: the semantics is then obtained by combining the values of all accepting runs with the additive operation of the semiring (as for weighted automata). We show that finitely-ambiguous copyless non-deterministic CRAs (i.e. the ones that admit a bounded number of accepting runs on every input word) can be effectively transformed into an equivalent copyless (deterministic) CRA, without requiring any specific property on the semiring. As a corollary, this also shows that regular look-ahead can effectively be removed from copyless CRAs.

Vadim Malvone

Natural Strategic Ability

Abstract: In game theory, as well as in the semantics of game logics, a strategy can be represented by any function from states of the game to the agent’s actions. That makes sense from the mathematical point of view, but not necessarily in the context of human behavior. This is because humans are quite bad at executing complex plans, and also rather unlikely to come up with such plans in the first place. In this work, we adopt the view of bounded rationality, and look only at "simple" strategies in specifications of agents’ abilities. We formally define what "simple" means, and propose a variant of alternating-time temporal logic that takes only such strategies into account. We also study the model checking problem for the resulting semantics of ability.

Mikaël Monet

Logical Expressiveness of Graph Neural Networks

Abstract: Graph Neural Networks (GNNs) are a family of machine learning architectures that has recently become popular for applications dealing with structured data, such as molecule classification and knowledge graph completion. Recent work on the expressive power of GNNs has established a close connection between their ability to classify nodes in a graph and the Weisfeiler-Lehman (WL) test for checking graph isomorphism. In turn, a seminal result by Cai et al. establishes that the WL test is tightly connected to the two-variable fragment of first-order logic extended with counting capabilities (FOC2). However, these results put together do not seem to characterize the relationship between GNNs and FOC2. This motivates the following question: which FOC2 node properties are expressible by GNNs? We start by considering GNNs that update the feature vector of a node by combining it with the aggregation of the vectors of its neighbors; we call these aggregate-combine GNNs (AC-GNNs). On the negative side, we present a simple FOC2 node property that cannot be captured by any AC-GNN. On the positive side, we identify a natural fragment of FOC2 whose expressiveness is subsumed by that of AC-GNNs. This fragment corresponds to graded modal logic, or, equivalently, to the description logic ALCQ, which has received considerable attention in the knowledge representation community. Next we improve the AC-GNN architecture by allowing global readouts, where in each layer we can compute a feature vector for the whole graph and combine it with local aggregations; we call these aggregate-combine-readout GNNs (ACR-GNNs). In this setting, we prove that each FOC2 formula is captured by an ACR-GNN classifier. Besides their own value, these results put together indicate that readouts strictly increase the discriminative power of GNNs. (Ongoing work with Pablo Barceló, Egor Kostylev, Jorge Pérez, Juan Reutter and Juan Pablo Silva)

Le Thanh Dung Nguyen

Transductions in typed λ-calculi

Abstract: We propose a counterpart of (type-theoretic) implicit computational complexity for automata (in our case, tree transducers) instead of complexity classes, in the same way that monadic second-order logic provides a counterpart to descriptive complexity. Taking inspiration from higher-order model checking, we use Church encodings in typed λ-calculi for this purpose.

Pierre Ohlmann

Deciding existence of strategies for almost sure reachability in probabilistic population control systems

Abstract: We consider the following problem : given a Markov Decision Process (MDP) with global updates (meaning, where an action has a simultaneous effect on all states), does there exist a strategy enforcing that any number of tokens on a given initial state are transferred to a final state ? This problem was first posed by Bertrand, Dewaksar, Genest, Gimbert and Godbole and is largely motivated by control problems in biological systems. It was shown to be EXPTIME-complete in the case where agents exhibit a non-deterministic (or adversarial) behaviour. We show decidability of the decision problem in the probabilistic setting (MDP), mostly using tools of infinite duration games and regular cost functions. This is a joint work with Thomas Colcombet and Nathanaël Fijalkow.

Mickael Randour

Games Where You Can Play Optimally With Finite Memory

Abstract: Two-player games provide a natural framework for controller synthesis, with well-known connections with logic and automata. A seminal result in the study of two-player (turn-based) games on (finite) graphs is due to Gimbert and Zielonka [1], who established a complete characterization of preference relations (and consequently, winning objectives) for which players can play optimally with memoryless strategies, i.e., without looking at the past at all. Almost all simple objectives in the literature fall in this class and thus yield memoryless determined games. Still, the recent appearance of more complex objectives and, above all, the increasing resort to multi-objective games (often required to accurately account for trade-offs between several qualitative and quantitative aspects in real-world applications) has led to games where the players do need to use memory, which can be finite or even infinite in some cases (e.g., multi-mean-payoff games). In this talk, I will present ongoing research, where we tackle the question of finite-memory strategies in a flavor similar to Gimbert and Zielonka’s approach of memoryless strategies. We will see that obtaining a perfect equivalent of Gimbert and Zielonka’s results is out of the question, as counter-examples exist; and we will describe how we circumvent the related obstacles to achieve a general characterization of preference relations for which finite memory suffice to play optimally. My talk will focus on examples and key concepts to highlight the core challenges and fundamental differences with the memoryless case. This talk is based on joint work with Patricia Bouyer, Stéphane Le Roux, Youssouf Oualhadj and Pierre Vandenhove. [1] Hugo Gimbert, Wieslaw Zielonka: Games Where You Can Play Optimally Without Any Memory. CONCUR 2005: 428-442.

Suman Sadhukhan

Network Games with Synchronous Cost

Abstract: Network games [1, 2, 3] are multi-player, concurrent, non-zero-sum games on a directed graph. Each player selects a path from source to target on the game graph (“selfish” behavior), pays cost with positive or negative congestion effect (i.e, shares or pays more) depending on the setting. Unlike one- shot selection of the whole path, we define our model where players choose their path edge by edge, and can observe the edges chosen by others so far. A similar setting was considered in [4] but in a broader framework than network games. Our model also departs from most of the classical models in the way of computing the cost of a player: in our model a player shares cost/ suffer congestion with another player only when their path intersect at some step, i.e, synchronize at their intersection; while in most of the existing model it would pay congestion cost (resp. shares cost) due to others whenever they have some intersection in their paths, even when they don’t use that edge at the same step. We call our setting as Network games with synchronous cost. We study how “selfish” behavior affects on the total cost/payoff compared to a centralized setting in this model: how players achieve an Equilibrium, what measures like Price of Anarchy (PoA), Price of Stability (PoS) turn out to be. Our objective is to compare these “selfish” and “centralised” behaviors in our synchronous cost evaluation scheme. Different models for different applicability have been studied for network games. Most of the studies establish general bounds of PoA, PoS in their respective models. Our objective is to study decidability and computability issues such as, given a network game with synchronous cost, whether PoA (resp. PoS) is less than a given constant. We provide EXPTIME algorithms to precisely compute PoS and PoA for such an instance. [1] Tim Roughgarden and Eva Tardos. How bad is selfish routing? JACM, 49(2):236–259, 2002. [2] Tim Roughgarden. Selfish routing with atomic players. In Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1184–1185, 01 2005. [3] Guy Avni, Shibashis Guha, and Orna Kupferman. Timed network games. In 42nd International Sym- posium on Mathematical Foundations of Computer Science, MFCS, pages 37:1–37:16, 2017. [4] Guy Avni, Thomas A. Henzinger, and Orna Kupferman. Dynamic resource allocation games. In Pro- ceedings of the International Symposium on Algorithmic Game Theory, volume 9928, pages 153–166, 2016.

Sławek Staworko

Shape Expressions Schemas for RDF Semantics, Complexity, and Inference

Abstract: We will present our continuing work on Shape Expression Schemas (ShEx) for RDF graphs. ShEx is a schema formalism inspired by schemas of XML, such as XML Schemas and DTDs, and the underlying tree automata models. Not surprisingly, ShEx can be seen as a model of graph automata, and interestingly, one that is related to the standard notion of graph simulation. First, we will define the semantics of ShEx and identify its computational implications on the problem of validation. We will identify the problem of inference: given an RDF graph construct a ShEx that describes the structure of the input RDF. This study will require us to answer fundamental questions on containment of ShEx. Finally, we will outline future direction of study in repairing RDF w.r.t. a given ShEx.

Michaël Thomazo

Reasoning about Disclosure in Data Integration in the Presence of Source Constraints

Abstract: Data integration systems allow users to access data sitting in multiple sources by means of queries over a global schema, related to the sources via mappings. Data sources often contain sensitive information, and thus an analysis is needed to verify that a schema satisfies a privacy policy, given as a set of queries whose answers should not be accessible to users. Such an analysis should take into account not only knowledge that an attacker may have about the mappings, but also what they may know about the semantics of the sources. In this paper, we show that source constraints can have a dramatic impact on disclosure analysis. We study the problem of determining whether a given data integration system discloses a source query to an attacker in the presence of constraints, providing both lower and upper bounds on source-aware disclosure analysis.