PROV

In this document we map simple script constructs to plain PROV.

Names, literals, and constants

entities represent variables names, literals (e.g., "a", 1, True), and constants (e.g., ...).

1     # literal
"a"   # literal
b"a"  # literal
True  # literal
int   # names
...   # constant

prefix script <https://dew-uff.github.io/versioned-prov/ns/script#>

entity(1, [value="1", type="script:literal"])
entity(a, [value="'a'", type="script:literal"])
entity(a#2, [value="b'a'", type="script:literal"])
entity(True, [prov:value="True", type="script:constant"])
entity(int, [prov:value="<class 'int'>", type="script:name", label="int"])
entity(ellipsis, [prov:value="Ellipsis", type="script:constant", label="..."])

Assignment

We represent an assignment by an activity that uses the entities on the right side to generate an entity on the left side.

An assignment creates a new entity for the name on the left side even when the name already exists.

m = 10000

prefix script <https://dew-uff.github.io/versioned-prov/ns/script#>

entity(10000, [value="10000", type="script:literal"])
entity(m, [value="10000", type="script:name", label="m"])

activity(assign1, [type="script:assign"])
wasDerivedFrom(m, 10000, assign1, g1, u1)

Operation

Similar to assigments, we also use activities to map operations. However, instead of producing an entity for a variable name, it produces an entity for the evaluation result.

m + 1

entity(1, [value="1", type="script:literal"])
entity(sum, [value="10001", type="script:sum", label="m + 1"])

activity(+, [type="script:operation"])
wasDerivedFrom(sum, m, +, g2, u2)
wasDerivedFrom(sum, 1, +, g2, u3)

List definition

A list is represented by an entity with hadMember relationships to its parts.

The provenance of a Floyd-Warshall execution should indicate the position of accessed elements in the result matrix (list of lists) to allow the querying of the shortest-path between two nodes. However, using just the hadMember relationship, we cannot know in which position of the list a member exists (note below that an entity may repeat in multiple positions). Thus, to allow this query, we create an extra entity for every position in the list and we use an activity to derive these entities from the actual entities that compose the list.

For simplicity, in the case of the definition of matrices, we use a single activity to represent all the derivations, instead of an activity for each row.

[m, m + 1, m]

entity(list, [value="[10000, 10001, 10000]", type="script:list", label="[m, m + 1, m]"])
entity(list0, [value="10000", type="script:item", label="m"])
entity(list1, [value="10001", type="script:item", label="m + 1"])
entity(list2, [value="10000", type="script:item", label="m"])

hadMember(list, list0)
hadMember(list, list1)
hadMember(list, list2)

activity(definelist1, [type="script:definelist"])
wasDerivedFrom(list0, m, definelist1, g3, u4)
wasDerivedFrom(list1, sum, definelist1, g4, u5)
wasDerivedFrom(list2, m, definelist1, g5, u6)
wasGeneratedBy(list, definelist1, -)

Assignment of list definition

When we assign a list definition to a variable, we must create new entities not only for the variable, but also for all of its parts.

d = [m, m + 1, m]

entity(d, [value="[10000, 10001, 10000]", type="script:name", label="d"])
hadMember(d, list0)
hadMember(d, list1)
hadMember(d, list2)

activity(assign2, [type="script:assign"])
wasDerivedFrom(d, list, assign2, g6, u7)

The same mapping is valid for assignments to names that represent dictionaries.

x = d

entity(x, [value="[10000, 10001, 10000]", type="script:name"])

hadMember(x, list0)
hadMember(x, list1)
hadMember(x, list2)

activity(assign3, [type="script:assign"])
wasDerivedFrom(x, d, assign3, g7, u8)

Function call

We map a function call as an activity that uses its parameters and generates an entity with its return.

When we do not know the function call implementation, we cannot use derivation relationships.

len(d)

entity(len_d, [value="3", type="script:eval", label="len(d)"])

activity(call1, [type="script:call", label="len"])
used(call1, d, -)
wasGeneratedBy(len_d, call1, -)

Access to part of structure

We map an access as an activity that generates the accessed entity, by using the list entity, the list element, and the index, when it is explicitly used (for-each loops iterates over lists without explicit item entities). The generated entity derives from the list element.

d[0]

entity(0, [value="0", type="script:literal"])
entity(d@0, [value="10000", type="script:access", label="d[0]"])

activity(access1, [type="script:access"])
used(access1, d, -)
used(access1, 0, -)
wasDerivedFrom(d@0, list0, access1, g8, u9)

Assignment to part of structure

A part assignment is similitar to an assignment, but it creates a new entity for the collection entity with hadMember relationships to the new part and to the other parts that are valid.

If there is more than one variable or data structure with a reference to the changed list, we must update all the lists.

The assignment activity uses all the changed entities and generates new versions of them. Additionally, it uses the right side of the assignment to derive an entity for the left side.

d[1] = 3

entity(3, [value="3", type="script:literal"])
entity(d@1, [value="3", type="script:access", label="d[1]"])

activity(assign4, [type="script:assign"])
used(assign4, 1, -)
wasDerivedFrom(d@1, 3, assign4, g9, u10)

entity(d#2, [value="[10000, 3, 10000]", type="script:name", label="d"])
wasDerivedFrom(d#2, d, assign4, g10, u11)
wasDerivedFrom(d#2, 3, assign4, g10, u10)
hadMember(d#2, list0)
hadMember(d#2, d@1)
hadMember(d#2, list2)

entity(x#2, [value="[10000, 3, 10000]", type="script:name", label="x"])
wasDerivedFrom(x#2, x, assign4, g11, u12)
wasDerivedFrom(x#2, 3, assign4, g11, u10)
hadMember(x#2, list0)
hadMember(x#2, d@1)
hadMember(x#2, list2)

Full graph

The full mapping for the previous code is presented below:

>>> m = 10000
>>> d = [m, m + 1, m]
>>> x = d
>>> len(d)
3
>>> d[0]
10000
>>> d[1] = 3

prefix script <https://dew-uff.github.io/versioned-prov/ns/script#>
    
// assignment
entity(10000, [value="10000", type="script:literal"])
entity(m, [value="10000", type="script:name", label="m"])

activity(assign1, [type="script:assign"])
wasDerivedFrom(m, 10000, assign1, g1, u1)

// operation
entity(1, [value="1", type="script:literal"])
entity(sum, [value="10001", type="script:sum", label="m + 1"])

activity(+, [type="script:operation"])
wasDerivedFrom(sum, m, +, g2, u2)
wasDerivedFrom(sum, 1, +, g2, u3)

// list definition
entity(list, [value="[10000, 10001, 10000]", type="script:list", label="[m, m + 1, m]"])
entity(list0, [value="10000", type="script:item", label="m"])
entity(list1, [value="10001", type="script:item", label="m + 1"])
entity(list2, [value="10000", type="script:item", label="m"])

hadMember(list, list0)
hadMember(list, list1)
hadMember(list, list2)

activity(definelist1, [type="script:definelist"])
wasDerivedFrom(list0, m, definelist1, g3, u4)
wasDerivedFrom(list1, sum, definelist1, g4, u5)
wasDerivedFrom(list2, m, definelist1, g5, u6)
wasGeneratedBy(list, definelist1, -)

// list assignment
entity(d, [value="[10000, 10001, 10000]", type="script:name", label="d"])
hadMember(d, list0)
hadMember(d, list1)
hadMember(d, list2)

activity(assign2, [type="script:assign"])
wasDerivedFrom(d, list, assign2, g6, u7)


// list assignment 2
entity(x, [value="[10000, 10001, 10000]", type="script:name"])

hadMember(x, list0)
hadMember(x, list1)
hadMember(x, list2)

activity(assign3, [type="script:assign"])
wasDerivedFrom(x, d, assign3, g7, u8)

// call
entity(len_d, [value="3", type="script:eval", label="len(d)"])

activity(call1, [type="script:call", label="len"])
used(call1, d, -)
wasGeneratedBy(len_d, call1, -)

// part access
entity(0, [value="0", type="script:literal"])
entity(d@0, [value="10000", type="script:access", label="d[0]"])

activity(access1, [type="script:access"])
used(access1, d, -)
used(access1, 0, -)
wasDerivedFrom(d@0, list0, access1, g8, u9)

// part assignment
entity(3, [value="3", type="script:literal"])
entity(d@1, [value="3", type="script:access", label="d[1]"])

activity(assign4, [type="script:assign"])
used(assign4, 1, -)
wasDerivedFrom(d@1, 3, assign4, g9, u10)

entity(d#2, [value="[10000, 3, 10000]", type="script:name", label="d"])
wasDerivedFrom(d#2, d, assign4, g10, u11)
wasDerivedFrom(d#2, 3, assign4, g10, u10)
hadMember(d#2, list0)
hadMember(d#2, d@1)
hadMember(d#2, list2)

entity(x#2, [value="[10000, 3, 10000]", type="script:name", label="x"])
wasDerivedFrom(x#2, x, assign4, g11, u12)
wasDerivedFrom(x#2, 3, assign4, g11, u10)
hadMember(x#2, list0)
hadMember(x#2, d@1)
hadMember(x#2, list2)