【SQL】SQL ~ 正規表現関数 / REGEXP_XXX ~

■ はじめに

業務で、PostgreSQL / Snowflake において
REGEXP_SUBSTR や REGEXP_REPLACE でてきたので
調べて、徐々にではあるがまとめておく。

なお、サンプルは、PostgreSQL17で試した。

SQL Fiddle
https://www.db-fiddle.com/

目次

【0】正規表現関数
【1】REGEXP_SUBSTR
【2】REGEXP_LIKE
【3】REGEXP_REPLACE
【4】REGEXP_INSTR
【5】REGEXP_COUNT

【0】正規表現関数

PostgreSQL
https://www.postgresql.jp/docs/15/functions-matching.html
Snowflake
https://docs.snowflake.com/ja/sql-reference/functions-regexp

【1】REGEXP_SUBSTR

* 指定したパターンに一致する部分文字列を抽出

Snowflake
https://docs.snowflake.com/ja/sql-reference/functions/regexp_substr
Redshift
https://docs.aws.amazon.com/ja_jp/redshift/latest/dg/REGEXP_SUBSTR.html

SELECT
 '123.456.6789' AS data,
 regexp_substr('123.456.6789', '[0-9]+', 1, 1) AS result1,
 regexp_substr('123.456.6789', '[0-9]+', 1, 2) AS result2,
 regexp_substr('123.456.6789', '[0-9]+', 1, 3) AS result3 
;
data result1 result2 result3
123.456.6789 123 456 6789
-- Sample data
CREATE TABLE sample_table (id INT, data VARCHAR);

INSERT INTO sample_table (id, data) VALUES 
    -- A string with multiple occurrences of the word "the".
    (1, 'It was the best of times, it was the worst of times.'),
    -- A string with multiple occurrences of the word "the" and with extra
    -- blanks between words.
    (2, 'In    the   string   the   extra   spaces  are   redundant.'),
    -- A string with the character sequence "the" inside multiple words 
    -- ("thespian" and "theater"), but without the word "the" by itself.
    (3, 'A thespian theater is nearby.')
;

-- Example
select
  id,
  regexp_substr(data, 'the\W+\w+') as "result"
from
  sample_table
order by id
;
id result
1 the best
2 the string
3 (null)

【2】REGEXP_LIKE

* 指定したパターンに一致するかどうかをチェックするために使用

Snowflake
https://docs.snowflake.com/ja/sql-reference/functions/regexp_like

SELECT
 'hello.world@sample.com' AS data,
 regexp_like('hello.world@sample.com', '^[a-zA-Z0-9_.+-]+@([a-zA-Z0-9][a-zA-Z0-9-]*[a-zA-Z0-9]*\.)+[a-zA-Z]{2,}$') AS result
UNION ALL
SELECT
 'hello.world^sample.com' AS data,
 regexp_like('hello.world_sample.com', '^[a-zA-Z0-9_.+-]+@([a-zA-Z0-9][a-zA-Z0-9-]*[a-zA-Z0-9]*\.)+[a-zA-Z]{2,}$') AS result
UNION ALL
SELECT
 'hello.world@111' AS data,
 regexp_like('hello.world@111', '^[a-zA-Z0-9_.+-]+@([a-zA-Z0-9][a-zA-Z0-9-]*[a-zA-Z0-9]*\.)+[a-zA-Z]{2,}$') AS result 
;
data result
hello.world@sample.com true
hello.world_sample.com false
hello.world@111 false

【3】REGEXP_REPLACE

* 指定したパターンに一致する部分を他の文字列に置換

Snowflake
https://docs.snowflake.com/ja/sql-reference/functions/regexp_replace

-- オプション 'g': 全て置換
SELECT
 '123.456.6789' AS data,
 REGEXP_REPLACE('123.456.6789', '[^0-9]', '-', 'g') AS result
;
data result
123.456.6789 123-456-6789

【4】REGEXP_INSTR

* 指定したパターンが最初に出現する位置を返す

Snowflake
https://docs.snowflake.com/ja/sql-reference/functions/regexp_instr
Redshift
https://docs.aws.amazon.com/ja_jp/redshift/latest/dg/REGEXP_INSTR.html

-- Sample data
CREATE TABLE demo_table(id INT, data VARCHAR);

INSERT INTO demo_table(id, data) VALUES 
(1, 'nevermore1, nevermore2, nevermore3.')
;

-- Example
SELECT
  id,
  data,
  regexp_substr(data, 'nevermore\d') AS "SUBSTRING", 
  regexp_instr(data, 'nevermore\d') AS "POSITION"
FROM demo_table
UNION ALL
SELECT
  id,
  data,
  regexp_substr(data, 'nevermore\d', 5) AS "SUBSTRING", 
  regexp_instr(data, 'nevermore\d', 5) AS "POSITION"
FROM demo_table
UNION ALL
SELECT
  id,
  data,
  regexp_substr(data, 'nevermore\d', 1, 3) AS "SUBSTRING", 
  regexp_instr(data, 'nevermore\d', 1, 3) AS "POSITION"
FROM demo_table
;
id data SUBSTRING POSITION
1 nevermore1, nevermore2, nevermore3. nevermore1 1
1 nevermore1, nevermore2, nevermore3. nevermore2 13
1 nevermore1, nevermore2, nevermore3. nevermore3 25

【5】REGEXP_COUNT

* パターンに一致する文字列の個数をカウント

Snowflake
https://docs.snowflake.com/ja/sql-reference/functions/regexp_count
Redshift
https://docs.aws.amazon.com/ja_jp/redshift/latest/dg/REGEXP_COUNT.html

-- オプション「i」: 大文字と小文字を区別しない一致を有効
SELECT
  'ABCABCDEF' AS data,
  REGEXP_COUNT('ABCABCDEF', 'Abc', 1, 'i') As result
;
data result
ABCABCDEF 2

参考文献

https://sql-marketing-academy.subroq.co.jp/blogs/regex-functions

関連記事

SQL ~ 文字列操作関連 ~
https://dk521123.hatenablog.com/entry/2013/01/21/233512
Snowflake ~ 文字列操作関連 ~
https://dk521123.hatenablog.com/entry/2022/10/01/000000
PostgreSQL】文字列関数 ~ string_agg ~
https://dk521123.hatenablog.com/entry/2021/08/21/000000
PostgreSQL】文字列関数 ~ split_part ~
https://dk521123.hatenablog.com/entry/2021/09/07/000000