
Then we took every rule and generated new facts by taking all bindings of the head variables in the body of the rule which are not in the head (sets B, C and D in our mining model). We took all rules mined by AMIE with head coverage threshold 0.01 and ranked them by standard and PCA confidence. In order to support the suitability of the PCA Confidence metric for prediction of new facts, we carried an experiment which uses the rules mined by AMIE on YAGO2 (training KB) to predict facts in the newer YAGO2s (testing KB). Data prediction Standard vs PCA confidence As with the other datasets, we removed literal facts and type information leading to a clean set of 8.4M facts on which we ran AMIE. For our experiments we used a dump of Wikidata from December 2014. The goal of the Wikidata project is to provide the same information as Wikipedia but in a computer-readable format, that is, Wikidata can be seen as the structured sibling of Wikipedia. Wikidata is a free, community-based knowledge base maintained by the Wikimedia Foundation. This produced a clean subset of 6.7M facts for and 11.02M for. We also removed relations with less than 100 facts.

In both cases we used the person data and infoboxes datasets and removed facts with literal objects and rdf:type statements. In the spirit of our data prediction endevours, we mined rules from DBpedia 2.0 to predict facts in DBpedia 3.8 (in English). The English version of DBpedia contains 1.89 billion facts about 2.45M entities. DBpediaĭBpedia is a crowd-sourced community effort to extract structured information from Wikipedia. In contrast to a random sample of facts, this method preserves the original graph topology. For this reason, we built a sample of this KB by randomly picking 10K entities and collecting their 3 hops subgraphs. Our experiments included comparisons against state-of-the-art systems which could not handle even our clean version of YAGO2. The clean testing versions of and are available for download. For YAGO2 we use the file yago core which contains 948K facts after cleaning. For YAGO2s, this is equivalent to use the file yagoFacts with 4.12M triples. For both versions of the ontology we did not consider either facts with literal objects or any type of schema information (rdf:type statements, relation signatures and descriptions). YAGO contains 120M facts about 2.6M entities.

Since the rules output by AMIE are used for prediction, we used the previous version, YAGO2 (released in 2010), to predict facts in YAGO2s. The latest version, YAGO2s, contains 120M facts describing properties of 10M different entities. YAGO is a semantic knowledge base derived from Wikipedia, WordNet and GeoNames. Any deviations from these settings are explicitly mentioned. By default, AMIE+ uses a head coverage threshold of 0.01 and a minimum PCA confidence of 0.1 and disables the instantiation operator (atoms do not contain constants). AMIE and AMIE+ can sort and threshold on support, head coverage, standard confidence and PCA confidence. We report the runtimes for AMIE+, the latest version of AMIE, that includes a set of runtime enhancements.

AMIE can extract closed horn rules from medium-sized ontologies in a few minutes.
