I was lazy when I originally created a table where one column could contain multiple email addresses, separated by commas, spaces or returns in the field.
Now I like to create a separate table that holds just the email addresses. This can lead to multiple emails belonging to the same original row, so I also like to have a mapping table that links the one or more email rows to the original rows (I understand that for this 1-to-many case where each email can only belong to one customer, this extra map table is not required, but for the sake of solving the specific challenge behind it I like to keep this map relevant for this question).
There are two challenges for me:
- How to I write the
SELECT
so that I get multiple rows for the comma separated values? The solution I found is to useUNNEST(string_to_array(regexp_replace(TRIM(emails),'\s+',',','g'),','))
. (TheTRIM
is for safety, removing extra spaces around the field, so that I do not end up with empty rows - is there a better way, e.g. by removing empty elems from the array? Searching locates this, which seems to be needlessly complex for my case, though) - When I create the mapping row, I need two ids. How do I get these two values? One I get from inserting new rows based on the original rows, but I also need the original row's id. Which means that when I select the original rows, I need to fetch its id along with the emails - but I do not understand how to keep it around while using the emails for the insertion.
Example
Original data
DROP TABLE IF EXISTS test_customers;CREATE TABLE test_customers (id SERIAL PRIMARY KEY, emails TEXT);INSERT INTO test_customers (emails) VALUES ('a@x.com'), ('b1@x.com,b2@x.com');SELECT * FROM test_customers;
Table test_customers
id | emails |
---|---|
1 | a@x.com |
2 | b1@x.com,b2@x.com |
Desired outcome
The new tables
DROP TABLE IF EXISTS test_emails;CREATE TABLE test_emails ( id SERIAL PRIMARY KEY, email TEXT);DROP TABLE IF EXISTS test_map;CREATE TABLE test_map ( cust_id SERIAL, email_id SERIAL, CONSTRAINT fk_customer FOREIGN KEY(cust_id) REFERENCES test_customers(id), CONSTRAINT fk_email FOREIGN KEY(email_id) REFERENCES test_emails(id));
Table test_customers
Basically, the email column isn't needed any more (I can simply drop that in the end, a no brainer):
id |
---|
1 |
2 |
Table test_emails
The emails are now here:
id | |
---|---|
1 | a@x.com |
2 | b1@x.com |
3 | b2@x.com |
Table test_map
And this links the emails to the customers:
cust_id | email_id |
---|---|
1 | 1 |
2 | 2 |
2 | 3 |
My SQL code for converting all rows, so far
WITH map_ins AS ( INSERT INTO test_emails (email) SELECT id AS cust_id, UNNEST(string_to_array(regexp_replace(TRIM(emails),'\s+',',','g'),',')) FROM test_customers RETURNING id AS email_id, cust_id)INSERT INTO test_map (cust_id, email_id) VALUES (cust_id, email_id);
This gives an error:
INSERT has more expressions than target columns
That's where I'm stuck. The first SELECT
needs to fetch both the id and the emails so that I can return them further down, but if I do that, then how do I omit it in the INSERT where I only want to insert the email
but not the id
into the row?