■ はじめに
業務で、PostgreSQL / Snowflake において REGEXP_SUBSTR や REGEXP_REPLACE でてきたので 調べて、徐々にではあるがまとめておく。 なお、サンプルは、PostgreSQL17で試した。
SQL Fiddle
https://www.db-fiddle.com/
目次
【0】正規表現関数 【1】REGEXP_SUBSTR 【2】REGEXP_LIKE 【3】REGEXP_REPLACE 【4】REGEXP_INSTR 【5】REGEXP_COUNT
【0】正規表現関数
PostgreSQL
https://www.postgresql.jp/docs/15/functions-matching.html
Snowflake
https://docs.snowflake.com/ja/sql-reference/functions-regexp
【1】REGEXP_SUBSTR
* 指定したパターンに一致する部分文字列を抽出
Snowflake
https://docs.snowflake.com/ja/sql-reference/functions/regexp_substr
Redshift
https://docs.aws.amazon.com/ja_jp/redshift/latest/dg/REGEXP_SUBSTR.html
SELECT '123.456.6789' AS data, regexp_substr('123.456.6789', '[0-9]+', 1, 1) AS result1, regexp_substr('123.456.6789', '[0-9]+', 1, 2) AS result2, regexp_substr('123.456.6789', '[0-9]+', 1, 3) AS result3 ;
data | result1 | result2 | result3 |
---|---|---|---|
123.456.6789 | 123 | 456 | 6789 |
-- Sample data CREATE TABLE sample_table (id INT, data VARCHAR); INSERT INTO sample_table (id, data) VALUES -- A string with multiple occurrences of the word "the". (1, 'It was the best of times, it was the worst of times.'), -- A string with multiple occurrences of the word "the" and with extra -- blanks between words. (2, 'In the string the extra spaces are redundant.'), -- A string with the character sequence "the" inside multiple words -- ("thespian" and "theater"), but without the word "the" by itself. (3, 'A thespian theater is nearby.') ; -- Example select id, regexp_substr(data, 'the\W+\w+') as "result" from sample_table order by id ;
id | result |
---|---|
1 | the best |
2 | the string |
3 | (null) |
【2】REGEXP_LIKE
* 指定したパターンに一致するかどうかをチェックするために使用
Snowflake
https://docs.snowflake.com/ja/sql-reference/functions/regexp_like
SELECT 'hello.world@sample.com' AS data, regexp_like('hello.world@sample.com', '^[a-zA-Z0-9_.+-]+@([a-zA-Z0-9][a-zA-Z0-9-]*[a-zA-Z0-9]*\.)+[a-zA-Z]{2,}$') AS result UNION ALL SELECT 'hello.world^sample.com' AS data, regexp_like('hello.world_sample.com', '^[a-zA-Z0-9_.+-]+@([a-zA-Z0-9][a-zA-Z0-9-]*[a-zA-Z0-9]*\.)+[a-zA-Z]{2,}$') AS result UNION ALL SELECT 'hello.world@111' AS data, regexp_like('hello.world@111', '^[a-zA-Z0-9_.+-]+@([a-zA-Z0-9][a-zA-Z0-9-]*[a-zA-Z0-9]*\.)+[a-zA-Z]{2,}$') AS result ;
data | result |
---|---|
hello.world@sample.com | true |
hello.world_sample.com | false |
hello.world@111 | false |
【3】REGEXP_REPLACE
* 指定したパターンに一致する部分を他の文字列に置換
Snowflake
https://docs.snowflake.com/ja/sql-reference/functions/regexp_replace
-- オプション 'g': 全て置換 SELECT '123.456.6789' AS data, REGEXP_REPLACE('123.456.6789', '[^0-9]', '-', 'g') AS result ;
data | result |
---|---|
123.456.6789 | 123-456-6789 |
【4】REGEXP_INSTR
* 指定したパターンが最初に出現する位置を返す
Snowflake
https://docs.snowflake.com/ja/sql-reference/functions/regexp_instr
Redshift
https://docs.aws.amazon.com/ja_jp/redshift/latest/dg/REGEXP_INSTR.html
-- Sample data CREATE TABLE demo_table(id INT, data VARCHAR); INSERT INTO demo_table(id, data) VALUES (1, 'nevermore1, nevermore2, nevermore3.') ; -- Example SELECT id, data, regexp_substr(data, 'nevermore\d') AS "SUBSTRING", regexp_instr(data, 'nevermore\d') AS "POSITION" FROM demo_table UNION ALL SELECT id, data, regexp_substr(data, 'nevermore\d', 5) AS "SUBSTRING", regexp_instr(data, 'nevermore\d', 5) AS "POSITION" FROM demo_table UNION ALL SELECT id, data, regexp_substr(data, 'nevermore\d', 1, 3) AS "SUBSTRING", regexp_instr(data, 'nevermore\d', 1, 3) AS "POSITION" FROM demo_table ;
id | data | SUBSTRING | POSITION |
---|---|---|---|
1 | nevermore1, nevermore2, nevermore3. | nevermore1 | 1 |
1 | nevermore1, nevermore2, nevermore3. | nevermore2 | 13 |
1 | nevermore1, nevermore2, nevermore3. | nevermore3 | 25 |
【5】REGEXP_COUNT
* パターンに一致する文字列の個数をカウント
Snowflake
https://docs.snowflake.com/ja/sql-reference/functions/regexp_count
Redshift
https://docs.aws.amazon.com/ja_jp/redshift/latest/dg/REGEXP_COUNT.html
-- オプション「i」: 大文字と小文字を区別しない一致を有効 SELECT 'ABCABCDEF' AS data, REGEXP_COUNT('ABCABCDEF', 'Abc', 1, 'i') As result ;
data | result |
---|---|
ABCABCDEF | 2 |
参考文献
https://sql-marketing-academy.subroq.co.jp/blogs/regex-functions
関連記事
SQL ~ 文字列操作関連 ~
https://dk521123.hatenablog.com/entry/2013/01/21/233512
Snowflake ~ 文字列操作関連 ~
https://dk521123.hatenablog.com/entry/2022/10/01/000000
【PostgreSQL】文字列関数 ~ string_agg ~
https://dk521123.hatenablog.com/entry/2021/08/21/000000
【PostgreSQL】文字列関数 ~ split_part ~
https://dk521123.hatenablog.com/entry/2021/09/07/000000