■ はじめに

業務で、PostgreSQL / Snowflake において
REGEXP_SUBSTR や REGEXP_REPLACE でてきたので
調べて、徐々にではあるがまとめておく。

なお、サンプルは、PostgreSQL17で試した。

SQL Fiddle
https://www.db-fiddle.com/

【０】正規表現関数

PostgreSQL
https://www.postgresql.jp/docs/15/functions-matching.html
Snowflake
https://docs.snowflake.com/ja/sql-reference/functions-regexp

【１】REGEXP / ~

* 正規表現に一致するかどうか

PostgreSQL

演算子	説明
~	正規表現に一致する。大文字小文字の区別あり
~*	正規表現に一致する。大文字小文字の区別なし
!~	正規表現に一致しない。大文字小文字の区別あり
!~*	正規表現に一致しない。大文字小文字の区別なし

SELECT * FROM users WHERE name ~ '^[A-Z]';

Snowflake
https://docs.snowflake.com/ja/sql-reference/functions/regexp

SELECT * FROM users WHERE name REGEXP '^[A-Z]';

【２】REGEXP_SUBSTR

* 指定したパターンに一致する部分文字列を抽出

Snowflake
https://docs.snowflake.com/ja/sql-reference/functions/regexp_substr
Redshift
https://docs.aws.amazon.com/ja_jp/redshift/latest/dg/REGEXP_SUBSTR.html

SELECT
 '123.456.6789' AS data,
 regexp_substr('123.456.6789', '[0-9]+', 1, 1) AS result1,
 regexp_substr('123.456.6789', '[0-9]+', 1, 2) AS result2,
 regexp_substr('123.456.6789', '[0-9]+', 1, 3) AS result3 
;

data	result1	result2	result3
123.456.6789	123	456	6789

-- Sample data
CREATE TABLE sample_table (id INT, data VARCHAR);

INSERT INTO sample_table (id, data) VALUES 
    -- A string with multiple occurrences of the word "the".
    (1, 'It was the best of times, it was the worst of times.'),
    -- A string with multiple occurrences of the word "the" and with extra
    -- blanks between words.
    (2, 'In    the   string   the   extra   spaces  are   redundant.'),
    -- A string with the character sequence "the" inside multiple words 
    -- ("thespian" and "theater"), but without the word "the" by itself.
    (3, 'A thespian theater is nearby.')
;

-- Example
select
  id,
  regexp_substr(data, 'the\W+\w+') as "result"
from
  sample_table
order by id
;

id	result
1	the best
2	the string
3	(null)

【３】REGEXP_LIKE

* 指定したパターンに一致するかどうかをチェックするために使用

Snowflake
https://docs.snowflake.com/ja/sql-reference/functions/regexp_like

SELECT
 'hello.world@sample.com' AS data,
 regexp_like('hello.world@sample.com', '^[a-zA-Z0-9_.+-]+@([a-zA-Z0-9][a-zA-Z0-9-]*[a-zA-Z0-9]*\.)+[a-zA-Z]{2,}$') AS result
UNION ALL
SELECT
 'hello.world^sample.com' AS data,
 regexp_like('hello.world_sample.com', '^[a-zA-Z0-9_.+-]+@([a-zA-Z0-9][a-zA-Z0-9-]*[a-zA-Z0-9]*\.)+[a-zA-Z]{2,}$') AS result
UNION ALL
SELECT
 'hello.world@111' AS data,
 regexp_like('hello.world@111', '^[a-zA-Z0-9_.+-]+@([a-zA-Z0-9][a-zA-Z0-9-]*[a-zA-Z0-9]*\.)+[a-zA-Z]{2,}$') AS result 
;

data	result
hello.world@sample.com	true
hello.world_sample.com	false
hello.world@111	false

【４】REGEXP_REPLACE

* 指定したパターンに一致する部分を他の文字列に置換

Snowflake
https://docs.snowflake.com/ja/sql-reference/functions/regexp_replace

-- オプション 'g': 全て置換
SELECT
 '123.456.6789' AS data,
 REGEXP_REPLACE('123.456.6789', '[^0-9]', '-', 'g') AS result
;

data	result
123.456.6789	123-456-6789

【５】REGEXP_INSTR

* 指定したパターンが最初に出現する位置を返す

Snowflake
https://docs.snowflake.com/ja/sql-reference/functions/regexp_instr
Redshift
https://docs.aws.amazon.com/ja_jp/redshift/latest/dg/REGEXP_INSTR.html

-- Sample data
CREATE TABLE demo_table(id INT, data VARCHAR);

INSERT INTO demo_table(id, data) VALUES 
(1, 'nevermore1, nevermore2, nevermore3.')
;

-- Example
SELECT
  id,
  data,
  regexp_substr(data, 'nevermore\d') AS "SUBSTRING", 
  regexp_instr(data, 'nevermore\d') AS "POSITION"
FROM demo_table
UNION ALL
SELECT
  id,
  data,
  regexp_substr(data, 'nevermore\d', 5) AS "SUBSTRING", 
  regexp_instr(data, 'nevermore\d', 5) AS "POSITION"
FROM demo_table
UNION ALL
SELECT
  id,
  data,
  regexp_substr(data, 'nevermore\d', 1, 3) AS "SUBSTRING", 
  regexp_instr(data, 'nevermore\d', 1, 3) AS "POSITION"
FROM demo_table
;

id	data	SUBSTRING	POSITION
1	nevermore1, nevermore2, nevermore3.	nevermore1	1
1	nevermore1, nevermore2, nevermore3.	nevermore2	13
1	nevermore1, nevermore2, nevermore3.	nevermore3	25

【６】REGEXP_COUNT

* パターンに一致する文字列の個数をカウント

Snowflake
https://docs.snowflake.com/ja/sql-reference/functions/regexp_count
Redshift
https://docs.aws.amazon.com/ja_jp/redshift/latest/dg/REGEXP_COUNT.html

-- オプション「i」: 大文字と小文字を区別しない一致を有効
SELECT
  'ABCABCDEF' AS data,
  REGEXP_COUNT('ABCABCDEF', 'Abc', 1, 'i') As result
;

data	result
ABCABCDEF	2

参考文献

https://sql-marketing-academy.subroq.co.jp/blogs/regex-functions

SQL ～文字列操作関連～
https://dk521123.hatenablog.com/entry/2013/01/21/233512
Snowflake ～文字列操作関連～
https://dk521123.hatenablog.com/entry/2022/10/01/000000
【PostgreSQL】文字列関数～ string_agg ～
https://dk521123.hatenablog.com/entry/2021/08/21/000000
【PostgreSQL】文字列関数～ split_part ～
https://dk521123.hatenablog.com/entry/2021/09/07/000000

ランキング参加中

プログラミング

プログラムの超個人的なメモ

Memo for Programming.

【SQL】SQL ～正規表現関数 / REGEXP_XXX ～

■ はじめに

目次

【０】正規表現関数

【１】REGEXP / ~

【２】REGEXP_SUBSTR

【３】REGEXP_LIKE

【４】REGEXP_REPLACE

【５】REGEXP_INSTR

【６】REGEXP_COUNT

参考文献

関連記事