Super column family
A super column family is a NoSQL object that contains column families. It is a tuple (pair) that consists of a key-value pair, where the key is mapped to a value that are column families.[1] In analogy with relational databases, a super column family is something like a "view" on a number of tables. It can also be seen as a map of tables.[2]
Benefits
It is useful when making a data model to have some kind of a view on a number of tables. Using a super column family is something similar to that in distributed data stores. There are, however, no "joins" between the "tables", as data stores like Apache Cassandra are non-relational.
Sorting and querying
There is no way to sort super columns after they have been inserted, nor to query an arbitrary query in distributed data stores. Super columns are sorted when they are added to the column family, and it is also possible to use a different sorting attribute for the contained columns of a super column. Similar to the standard column family, sorting is defined by an attribute. This attribute is called the CompareSubcolumnsWith
in Apache Cassandra and have the following values:
AsciiType
BytesType
LexicalUUIDType
LongType
TimeUUIDType
UTF8Type
Although it is possible to sort the super columns in a way, the columns inside the super columns another way, it is not allowed to treat part of the super columns in a special way.[3]
Super column families vs. views
Column families have a schemaless nature so that each of their "row"s can contain a different number of columns, and even different column names could be in each row.[4] So, they are a very different concept than the rows in relational database management system (RDBMS)s. This is one of the reasons why the concept is not trivial for an experienced RDBMS expert.
Code example
Here is an example of a super column family that contains other column families:[4]
UserList={
Cath:{
username:{firstname:”Cath”,lastname:”Yoon”}
address:{city:”Seoul”,postcode:”1234”}
}
Terry:{
username:{firstname:”Terry”,lastname:”Cho”}
account:{bank:”hana”,accounted:”1234”}
}
}
Where "Cath" and "Terry" are row keys; "username", "address", and "account" are super column names; and "firstname", "lastname", "city", etc. are column names.
See also
References
- Ronald Mathies (2010-03-18). "Installing and using Apache Cassandra With Java Part 2 (Data model)". http://www.sodeso.nl/: Sodeso - Software Development Solution. Retrieved 2011-03-28.
[...] the largest container, the SuperColumnFamily, if you understand the ColumnFamily then this construction isn’t much harder, instead of having Columns in the inner most Map we have SuperColumns. So it just adds an extra dimension. As displayed in the image, the Key of the Map which contain the SuperColumns must be the same as the name of the SuperColumn (just like with the ColumnFamily).
- Arin Sarkissian (2009-09-01). "WTF is a SuperColumn? An Intro to the Cassandra Data Model". http://arin.me/: Arin Sarkissian. Retrieved 2011-03-28.
4) a “Super Column Family” is a map of tables (=table of nested tables)
- "Installing and using Apache Cassandra With Java Part 3 (Data model 2)". http://www.sodeso.nl/: Sodeso - Software Development Solutions. Retrieved 2011-03-30.
The rules of sorting not only apply to Columns but also to SuperColumns, in case of the SuperColumns we also need to specify a second sorting rule using the CompareSubcolumnsWith attribute. [...] I used the UTF8Type for both the SuperColumn as for the Column within the SuperColumn, this doesn’t have to be the case, you can mix them using all the various sorting types. However it is not possible to have different sorting types on the same level, so it is not possible to use UTF8Type and the LongType for different SuperColumns in the same SuperColumnFamily, the same rule applies for Columns.
- Posted by Terry (2010-03-22). "Apache Cassandra Quick tour". Terry.Cho's blog. Retrieved 2011-03-25.
One of interest thing is each row can have different scheme. Cassandra row has “emailAddress” ,”age” column. TerryCho row has “emailAddress”,”gender” column. This characteristic is called as “Schemeless” (Data structure of each row in column family can be different).