coMakeIT - Xebia Logo

Moving from traditional Relational database systems to document-oriented database systems (also known as No-SQL) is a part of transformation and modernization of applications. But, there is a possibility of either over or under engineering while modelling the data for an application. This paper narrates the optimal ways of data modelling for document-oriented database system in order to achieve optimal performance.

Relation database systems organize data in the form of tables having rows and columns, whereas a document-oriented database organizes the data in JSON (JavaScript Object Notation) format as a document.

From the data organization perspective, major difference between SQL and No-SQL database system is, relational database system stores the data for any given object, typically across multiple tables by applying the normalization principles whereas the No-SQL database systems stores all the information related to an object in a single document without needing to search for in any other document. The subtle act of maintaining all the information of an object in a single document is often confused and could lead to performance issues. Care has to be taken to arrive at the optimized data-model, and follow the principles specified below.

Normalization vs De-Normalization

Neither of the principles (Normalization and De-Normalization) have an advantage over the other, however the practice depends on the data persona. The trade-off between normalization and de-normalization lies in access throughput and integrity of data.

For example, consider the below document object:

Customer: {
	‘_id’: “cust_001”, 
	'Name':"Customer_One",
	‘Address’:”Address details”, 
	'Orders_List': [
[0]: {
			'Type': "Sports",
			'SubType': "Tennis",
			'Item': "Racquet",
			'ItemSubType': "Wide-Band",
			'OrderStatus': "Active",
			'Quantity': 1,
			'DeliverySchedule':"2021-June-23",
			'DeliveryStatus':"In-Progress As Planned"
		}
	]	
}

In the above example, Customer document object, which also has the Order details as a sub-document (part of the list, Composite relation) is in de-normalized state.

Consider a scenario where all orders that are ‘Yet to be delivered’ and of type ‘Sports’ have to be displayed and/or managed (updated), one has to scan all the customer documents and identify such documents meeting the criteria and thereafter can be managed. In this scenario it would be inefficient.

Re-engineered the data-model by following the normalization principle and it can be shown as below:

Customer: {
	'Name':"Customer_One",
	‘Address’:”Address details”, 
	'Orders_List': [
[0]: “Order_1”
	]	
}

Order: {
	‘_id’: “Order_1”,
	'Type': "Sports",
	'SubType': "Tennis",
	'Item': "Racquet",
	'ItemSubType': "Wide-Band",
	'OrderStatus': "Active",
	'Quantity': 1,
	'DeliverySchedule':"2021-June-23",
	'DeliveryStatus':"In-Progress As Planned"
}

The documents of type ‘Customer’, and ‘Order’ are normalized and share associated relationship by having a reference into Order from the ‘Customer’ document.  This data-model is an optimal for the scenarios:

Here both the principles of normalization and de-normalization have their own advantages over the other and much depends on the persona of the data.

Normalization or De-Normalization of data-model depends on the factors:

Redundant

Data model contains redundant data because:

For typical relational data model designers, this appears to be awkward, however well-designed and orchestrated write operations help in improving the read-performance significantly.

Considering the same example, the data model can be re-engineered as:

Customer: {
	‘_id’: “cust_001”, 
	'Name':"Customer_One",
	‘Address’:”Address details”, 
	'Orders_List': [
[0]: {
			'Type': "Sports",
			'SubType': "Tennis",
			'Item': "Racquet",
			'ItemSubType': "Wide-Band",
			'OrderStatus': "Active",
			'Quantity': 1,
			'DeliverySchedule':"2021-June-23",
			'DeliveryStatus':"In-Progress As Planned"
		}
	]	
}
Order: {
	‘_id’: “Order_1”,
	'Type': "Sports",
	'SubType': "Tennis",
	'Item': "Racquet",
	'ItemSubType': "Wide-Band",
	'OrderStatus': "Active",
	'Quantity': 1,
	'DeliverySchedule':"2021-June-23",
	'DeliveryStatus':"In-Progress As Planned"
}

Here the document of type ‘Order’ is redundant, however such a model enables:

However, a data model having redundant data requires multiple write operations and a good design of orchestration is needed to ensure no significant impact due to multiple write-operations on the system.

Control vs Data

While modelling the data, often the principle of keeping all the information related to an object in a single document, misleads to storing the information related to controlling the data along with the actual data itself. For example, the below document-data indicates an Order of type ‘Sports’

Order:
{
	'Type': "Sports",
	'SubType': "Tennis",
	'Item': "Racquet",
	'ItemSubType': 'Wide-Band',
	'Quantity': 1
	'Role': {
		'Type':'OrderManagement',
		'SubType':'Sports'
	}
}

Well, this document contains information related to an ‘Order’, however it also contains an information related to who can manage the document, that is indicated by sub-document ‘Role’. Here ‘Role’ information is to control the actual document. This approach has the draw-back, as:

An optimal modelling of the data can be viewed in this scenario as:

Order:
{
	'Type': "Sports",
	'SubType': "Tennis",
	'Item': "Racquet",
	'ItemSubType': 'Wide-Band',
	'Quantity': 1
}
	
RoleMapping:
{
	'Type': "OrderManagement",
	'SubType': "Sports",
	'OrderTYpe': {
		'Type':'Sports',
		'SubType':'Tennis'
	}
}

Here the document is normalized by keeping the information related to controlling the document in a separate document. By using the underneath feature called ‘Indexing’ of the database system, all the order documents of type ‘Sports’ and ‘Tennis’ can be fetched for a user role of type ‘Order Management’, and ‘Sports’; a quicker and direct search of the documents has far better performance compared to an earlier data-model.

In this scenario, the essence of modelling was based on data that characterizes itself.

Sub-Documents and Arrays

MongoDB expects the arrays to be of consistent size and is not suitable for continuous growth of an array. Considering this fact, while embedding the document objects, in another document, containment of documents in an array would not be an optimal choice of design, normalizing of the documents is an optimal choice of design.

Conclusion

For more insights, reach out to our team of MongoDB experts constantly engaged in improving management of various data systems. For more information, please visit – https://www.comakeit.com/contact-us/

One Response

  1. Great Blog. I recently use data modelling tool and they build amazing products. You can use this and provides end-to-end workflow tools to align and empower the team. Go and check out once…..!!!!!!!

    Thanks

Leave a Reply

Your email address will not be published. Required fields are marked *