Extending neomodel¶
Inheritance¶
Neomodel extends the ability to compose classes by inheritance to the backend. This makes it possible to create a node class which extends the functionality that neomodel provides (such as neomodel.contrib.SemiStructuredNode).
Creating purely abstract classes is achieved using the __abstract_node__ property on base classes:
class User(StructuredNode):
__abstract_node__ = True
name = StringProperty(unique_index=True)
class Shopper(User):
balance = IntegerProperty(index=True)
def credit_account(self, amount):
self.balance = self.balance + int(amount)
self.save()
Custom label¶
By default, neomodel uses the class name as the label for nodes. This can be overridden by setting the __label__ property on the class:
class PersonClass(StructuredNode):
__label__ = "Person"
name = StringProperty(unique_index=True)
Creating a PersonClass instance and saving it to the database will result in a node with the label “Person”.
Optional Labels¶
Sometimes it is useful to allow a node to have extra labels in addition to the ones Neomodel defines by default using class names.
Nemodel constructs sets of labels to give to a node in Neo4j by looking at the names of the node classes that node is/inherits from. It also constructs a mapping of the reverse, expected node labels to node classes, in order to do object resolution.
In order for object resolution to work on Node Classes that could have extra labels, the __optional_labels__ property must be defined as a list of strings:
class Shopper(StructuredNode):
__optional_labels__ = ["SuperSaver", "SeniorDiscount"]
balance = IntegerProperty(index=True)
Note
The size of the node class mapping grows exponentially with optional labels. Use with some caution.
Mixins¶
Mixins can be used to share functionality between nodes classes:
class UserMixin:
name = StringProperty(unique_index=True)
password = StringProperty()
class CreditMixin:
balance = IntegerProperty(index=True)
def credit_account(self, amount):
self.balance = self.balance + int(amount)
self.save()
class Shopper(StructuredNode, UserMixin, CreditMixin):
pass
jim = Shopper(name='jimmy', balance=300).save()
jim.credit_account(50)
Please note that it has to be ensured that the mixins do not inherit from StructuredNode but that the concrete class does.
Overriding the StructuredNode constructor¶
When defining classes that require a custom __init__(self, ...)
constructor,
the super() class constructor must also be called always.
This is a neomodel
design convention that must be followed very strictly or risk breaking the whole process of
instantiating a model with data retrieved from the database.
For example, suppose a scenario where it should be possible for an Item
entity to also be instantiated via
a Product
entity. One way to achieve this, would be to have Item
’s constructor accept a product
parameter:
class Item(StructuredNode): name = StringProperty(unique_index=True) uid = StringProperty(unique_index=True) def __init__(self, product=None, *args, **kwargs): if product is not None: self.product = product kwargs["uid"] = 'g.' + str(self.product.pk) kwargs["name"] = self.product.product_name super().__init__(*args, **kwargs)
Note here that it is impossible to automatically infer that product
is a parameter that is only used in the
derivation of Item
’s attributes and the objective is to preserve the ability to instantiate Item
both via a
product
and simply via keyword arguments.
A more elegant way to provide the same functionality here would be to leave Item
’s constructor as is and provide an
additional function (e.g. from_product()
) for the alternative means of initialising the entity.
The first way of achieving this functionality and involves optional variables is probably easier to handle in Python 3 onwards (due to less restrictions in handling positional and keyword arguments) while the second way that involves setting up a separate function might be more preferable in earlier versions of Python.
It is also important to note that StructuredNode’s constructor will override properties set (which are defined on the class). Therefore constructor parameters must be passed via kwargs (as above). These can also be set after calling the constructor but this would skip validation.
Automatic class resolution¶
Neomodel is able to transform nodes to native data model objects, automatically, via a node-class registry that is progressively built up during the definition of the models.
This registry is a dictionary that provides a mapping from the set of labels associated with a node to the class that is implied by this set of labels.
Consider for example the following snippet of code:
import neomodel
class BasePerson(neomodel.StructuredNode):
pass
class TechnicalPerson(BasePerson):
pass
class PilotPerson(BasePerson):
pass
class UserClass(StructuredNode):
__label__ = "User"
Once this script is executed, the node-class registry would contain the following entries:
{"BasePerson"} --> class BasePerson
{"BasePerson", "TechnicalPerson"} --> class TechnicalPerson
{"BasePerson", "PilotPerson"} --> class PilotPerson
{"User"} --> class UserClass
Therefore, a Node
with labels "BasePerson", "TechnicalPerson"
would lead to the instantiation of a
TechnicalPerson
object. This automatic resolution is optional and can be invoked automatically via
neomodel.Database.cypher_query
if its resolve_objects
parameter is set to True
(the default is False
).
This automatic class resolution however, requires a bit of caution:
As a consequence of the way the node-class registry is built up and used, if a query results in instantiating an object whose class definition has not yet been imported, then exception
neomodel.exceptions.ModelDefinitionMismatch
will be raised.Given the above class hierarchy, suppose that each of the classes
BasePerson
,TechnicalPerson
,PilotPerson
were defined in separate files / modules and a script only included:from base_models import BasePerson from pilot_models import PilotPerson
Then, this would mean that the
BasePerson, TechnicalPerson --> TechnicalPerson
entry would not have been created in the node-class registry and therefore it would be impossible to resolve any Node objects (if they happened to come up in a query) to an application specific object.
Since the only way to resolve objects at runtime is this mapping of a set of labels to a class, then this mapping must be guaranteed to be unique. Therefore, if for any reason a class gets redefined, then exception
neomodel.exceptions.ClassAlreadyDefined
will be raised.Given the above class hierarchy, suppose that an attempt was made to redefine one of the existing classes in the local scope of some function
import neomodel class BasePerson(neomodel.StructuredNode): pass class TechnicalPerson(BasePerson): pass class PilotPerson(BasePerson): pass def some_function(): class PilotPerson(BasePerson): pass
If this was left unchecked and once
some_function()
executes, it would replace the mapping of{"BasePerson", "PilotPerson"}
toPilotPerson
in the global scope with a mapping of the same set of labels but towards the class defined within the local scope ofsome_function
.
Two classes with different names but the same __label__ override will also result in a
ClassAlreadyDefined
exception. This can be avoided under certain circumstances, as explained in the next section on ‘Database specific labels’.
Both ModelDefinitionMismatch
and ClassAlreadyDefined
produce an error message that returns the labels of the
node that created the problem (either the Node returned from the database or the class that was attempted to be
redefined) as well as the state of the current node-class registry. These two pieces of information can be used to
debug the model mismatch further.
Database specific labels¶
Only for Neo4j Enterprise Edition, with multiple databases
In some cases, it is necessary to have a class with a label that is not unique across the database. This can be achieved by setting the __target_databases__ property to a list of strings
class PatientOne(AsyncStructuredNode):
__label__ = "Patient"
__target_databases__ = ["db_one"]
name = StringProperty()
class PatientTwo(AsyncStructuredNode):
__label__ = "Patient"
__target_databases__ = ["db_two"]
identifier = StringProperty()
In this example, both PatientOne and PatientTwo have the label “Patient”, but these will be mapped in a database-specific node-class registry.
Now, if you fetch a node with label Patient from your database with auto resolution enabled, neomodel will try to resolve it to the correct class based on the database it was fetched from
db.set_connection("bolt://neo4j:password@localhost:7687/db_one")
patients = db.cypher_query("MATCH (n:Patient) RETURN n", resolve_objects=True) --> instance of PatientOne
The following will result in a ClassAlreadyDefined
exception, because when retrieving from db_one
,
neomodel would not be able to decide which model to parse into
class GeneralPatient(AsyncStructuredNode):
__label__ = "Patient"
name = StringProperty()
class PatientOne(AsyncStructuredNode):
__label__ = "Patient"
__target_databases__ = ["db_one"]
name = StringProperty()
Warning
This does not prevent you from saving a node to the “wrong database”. So you can still save an instance of PatientTwo to database “db_one”.
neomodel
under multiple processes and threads¶
It is very important to realise that neomodel preserves a mapping of the set of labels associated with the Neo4J Data Base Management System (DBMS) Node to the Python class this node corresponds to within a class hierarchy. Detailed information about this is available in Automatic class resolution.
This mapping is preserved within the same process along with transaction information.
Once a script that uses neomodel starts up, it imports its model definitions and starts communicating with the database within its own process.
neomodel internally creates a new session and through that session creates any additional transactions if required.
neomodel internally creates and updates a node-class registry.
Any additional threads spun up from this process will re-use the node-class registry.
Multiple calls to transaction handling functions will re-use a transaction if one is already going on within the same thread.
Separate threads can start different transactions but all of these transactions will be executed within the same session.
A script can still use neomodel across more than one processes as long as it gets re-initialised within each process
to the desired state. That is, once a new process starts, the neomodel.db
object will be re-initialised and the new
process would have to import any application specific models it requires for its operation. As the two processes are
independent, they will start different sessions to the Neo4j DBMS.
Any transactions occurring within the same session will take care of constraints and indices without any special care. However, transactions across different sessions are not aware of each other and therefore can lead to database exceptions.
For example, if an entity is declared with a unique index on one of its properties and two threads spun up from the
same process attempt a get_or_create
, then one of them will create
the node and the other will get
it.
No exceptions will be raised and get_or_create
would have proceeded as expected. However, if the exact same scenario
was attempted over transactions in two completely different sessions, then get_or_create
would appear to have
proceeded as expected in both of them, but one of them would further receive an exception about violating the uniqueness
constraint (which is not exactly what is expected when a get_or_create
is executed).
Both of these conditions: Multiple threads spun from a single process and multiple processes spun from a main process, are very relevant to the operation of neomodel over Neo4J Clusters and the way tests might be invoked.
A high throughput cluster environment (a few CORE clusters surrounded by many READ_REPLICAs) can use neomodel with
bolt+routing:
over multiple threads to issue parallel read queries (over explicitly declared READ transactions).
The same however would not work for parallel WRITE transactions because they all get processed within the
same session and there is no performance gain. In that case, the only solution would be to use neomodel over
multiple processes but ensure beforehand that any operations will not create conflicts (or anticipate and resolve
gracefully the exceptions that might be raised).
Similar considerations should also be given when writing tests for specific test modes. For example, pytest
collects tests within a directory and launches them in their own context and pytest-xdist
and pytest-forked
can run tests in a distributed / parallel mode. Exactly the same considerations regarding initialising / re-initialising
neomodel apply here as well and at the very minimum, you should ensure that tests either re-use classes, wherever
possible, or do not re-use the same class names within the same context of execution.