Merge (SQL)
A relational database management system uses SQL MERGE
(also called upsert) statements to INSERT
new records or UPDATE
existing records depending on whether condition matches. It was officially introduced in the SQL:2003 standard, and expanded in the SQL:2008 standard.
Usage
MERGE INTO tablename USING table_reference ON (condition)
WHEN MATCHED THEN
UPDATE SET column1 = value1 [, column2 = value2 ...]
WHEN NOT MATCHED THEN
INSERT (column1 [, column2 ...]) VALUES (value1 [, value2 ...]);
A right join is employed over the Target (the INTO table) and the Source (the USING table / view / sub-query)--where Target is the left table and Source is the right one. The four possible combinations yield these rules:
- If the ON field(s) in the Source matches the ON field(s) in the Target, then UPDATE
- If the ON field(s) in the Source does not match the ON field(s) in the Target, then INSERT
- If the ON field(s) does not exist in the Source but does exist in the Target, then no action is performed.
- If the ON field(s) does not exist in either the Source or Target, then no action is performed.
If multiple Source rows match a given Target row, an error is mandated by SQL:2003 standards. You cannot update a Target row multiple times with a MERGE statement
Implementations
Database management systems Oracle Database, DB2, Teradata, EXASOL, Firebird, CUBRID, H2, HSQLDB, MS SQL, Vectorwise and Apache Derby support the standard syntax. Some also add non-standard SQL extensions.
Synonymous
Some database implementations adopted the term "Upsert" (a portmanteau of update and insert) to a database statement, or combination of statements, that inserts a record to a table in a database if the record does not exist or, if the record already exists, updates the existing record. This synonym is used in PostgreSQL (v9.5+)[1] and SQLite (v3.24+).[2] It is also used to abbreviate the "MERGE" equivalent pseudo-code.
It is used in Microsoft SQL Azure.[3]
Other non-standard implementations
Some other database management systems support this, or very similar behavior, through their own, non-standard SQL extensions.
MySQL, for example, supports the use of INSERT ... ON DUPLICATE KEY UPDATE
syntax[4] which can be used to achieve a similar effect with the limitation that the join between target and source has to be made only on PRIMARY KEY or UNIQUE constraints, which is not required in the ANSI/ISO standard. It also supports REPLACE INTO
syntax,[5] which first attempts an insert, and if that fails, deletes the row, if exists, and then inserts the new one. There is also an IGNORE
clause for the INSERT
statement,[6] which tells the server to ignore "duplicate key" errors and go on (existing rows will not be inserted or updated, but all new rows will be inserted).
SQLite's INSERT OR REPLACE INTO
works similarly. It also supports REPLACE INTO
as an alias for compatibility with MySQL.[7]
Firebird supports MERGE INTO
though fails to throw an error when there are multiple Source data rows. Additionally there is a single-row version, UPDATE OR INSERT INTO tablename (columns) VALUES (values) [MATCHING (columns)]
, but the latter does not give you the option to take different actions on insert versus update (e.g. setting a new sequence value only for new rows, not for existing ones.)
IBM DB2 extends the syntax with multiple WHEN MATCHED
and WHEN NOT MATCHED
clauses, distinguishing them with ... AND some-condition
guards.
Microsoft SQL Server extends with supporting guards and also with supporting Left Join via WHEN NOT MATCHED BY SOURCE
clauses.
PostgreSQL supports merging via INSERT INTO ... ON CONFLICT [ conflict_target ] conflict_action
.[8]
CUBRID supports MERGE INTO
[9] statement. And supports the use of INSERT ... ON DUPLICATE KEY UPDATE
syntax.[10] It also supports REPLACE INTO
for compatibility with MySQL.[11]
Apache Phoenix supports UPSERT VALUES
[12] and UPSERT SELECT
[13] syntax.
Spark SQL supports UPDATE SET *
and INSERT *
clauses in actions.[14]
Apache Impala supports UPSERT INTO ... SELECT
.[15]
Usage in NoSQL
A similar concept is applied in some NoSQL databases.
E.g. in MongoDB the fields in a value associated with a key can be updated with an update
operation. The update
rasies an error if the key is not found.
In the update
operation it is possible to set the upsert
flag: in this case a new value is stored associated to the given key if it does not exist, otherwise the whole value is replaced.
In Redis the SET
operations sets the value associated with a given key. Redis does not know any detail of the internal structure of the value, so an update would have no meaning. So the SET
operation has always a set or replace semantics.
See also
- Join in particular:
References
- PostgreSQL-tutorial
- upsert sqlite.org visited 6-6-2018
- Transact-SQL Reference (Database Engine): MERGE (Transact-SQL)
- MySQL :: MySQL 5.1 Reference Manual :: 12.2.4.3 INSERT ... ON DUPLICATE KEY UPDATE Syntax
- MySQL 5.1 Reference Manual: 11.2.6 REPLACE Syntax
- "MySQL 5.5 Reference Manual :: 13.2.5 INSERT Syntax". Retrieved 29 October 2013.
- "SQL As Understood By SQLite: INSERT". Retrieved 2012-09-27.
- PostgreSQL INSERT page
- "New CUBRID 9.0.0". CUBRID Official Blog. 2012-10-30. Retrieved 2012-11-08.
- CUBRID :: Data Manipulation Statements :: Insert :: ON DUPLICATE KEY UPDATE Clause
- CUBRID :: Data Manipulation Statements :: Replace
- "UPSERT VALUES".
- "UPSERT SELECT".
- "MERGE INTO (Delta Lake on Databricks)".
- "UPSERT Statement (Apache Impala Documentation)".
- Hsu, Leo; Obe, Regina (May 18, 2008). "Cross Compare of SQL Server, MySQL, and PostgreSQL". Postgres OnLine Journal. Retrieved 8 October 2010.
- Chodorow, Kristina; Mike Dirolf (September 2010). MongoDB: The Definitive Guide. O'Reilly. ISBN 978-1-449-38156-1.