REGEXP_SUBSTR extends the functionality of the SUBSTR function by letting you search a string for a regular expression pattern. It is also similar to REGEXP_INSTR , but instead of returning the position of the substring, it returns the substring itself. This function is useful if you need the contents of a match string but not its position in the source string.
Syntax :
REGEXP_SUBSTR(source_char, pattern
[, position
[, occurrence
[, match_param
[, subexpr
]
]
]
]
)
Name |
Description |
source_char |
source_char is a character expression that serves as the search value. It is commonly a character column and can be of any of the data types CHAR, VARCHAR2, NCHAR, NVARCHAR2, CLOB or NCLOB. |
pattern |
pattern is the regular expression. It is usually a text literal and can be of any of the data types CHAR, VARCHAR2, NCHAR, or NVARCHAR2. It can contain up to 512 bytes. If the data type of pattern is different from the data type of source_char, then Oracle Database converts pattern to the data type of source_char. |
position |
position is a positive integer indicating the character of source_char where Oracle should begin the search. The default is 1, meaning that Oracle begins the search at the first character of source_char.
|
occurrence |
occurrence is a positive integer indicating which occurrence of pattern in source_char Oracle should search for. The default is 1, meaning that Oracle searches for the first occurrence of pattern. If occurrence is greater than 1, then the database searches for the second occurrence beginning with the first character following the first occurrence of pattern, and so forth. If occurrence is greater than 1, then the database searches for the second occurrence beginning with the first character following the first occurrence of pattern, and so forth. |
match_param |
match_parameter is a text literal that lets you change the default matching behavior of the function. This argument affects only the matching process and has no effect on replace_string. You can specify one or more of the following values for match_parameter: |
subexpr |
For a pattern with subexpressions, subexpr is a nonnegative integer from 0 to 9 indicating which subexpression in pattern is to be returned by the function. |
Characters |
Character Class |
Description |
^ |
Anchor the expression to the start of a line |
$ |
Anchor the expression to the end of a line |
|
Equivalence Classes |
Character Class |
Description |
= = |
Oracle supports the equivalence classes through the POSIX '[==]' syntax. A base letter and all of its accented versions constitute an equivalence class. For example, the equivalence class '[=a=]' matches ä and â. The equivalence classes are valid only inside the bracketed expression |
|
Match Options |
Character Class |
Description |
c |
Case sensitive matching |
i |
Case insensitive matching |
m |
Treat source string as multi-line activating Anchor chars |
n |
Allow the period (.) to match any newline character |
|
Posix Characters |
Character Class |
Description |
[:alnum:] |
Alphanumeric characters |
[:alpha:] |
Alphabetic characters |
[:blank:] |
Blank Space Characters |
[:cntrl:] |
Control characters (nonprinting) |
[:digit:] |
Numeric digits |
[:graph:] |
Any [:punct:], [:upper:], [:lower:], and [:digit:] chars |
[:lower:] |
Lowercase alphabetic characters |
[:print:] |
Printable characters |
[:punct:] |
Punctuation characters |
[:space:] |
Space characters (nonprinting), such as carriage return, newline, vertical tab, and form feed |
[:upper:] |
Uppercase alphabetic characters |
[:xdigit:] |
Hexidecimal characters |
|
Quantifier Characters |
Character Class |
Description |
* |
Match 0 or more times |
? |
Match 0 or 1 time |
+ |
Match 1 or more times |
{m} |
Match exactly m times |
{m,} |
Match at least m times |
{m, n} |
Match at least m times but no more than n times |
\n |
Cause the previous expression to be repeated n times |
|
Alternative Matching And Grouping Characters |
Character Class |
Description |
| |
Separates alternates, often used with grouping operator () |
( ) |
Groups subexpression into a unit for alternations, for quantifiers, or for backreferencing (see "Backreferences" section) |
[char] |
Indicates a character list; most metacharacters inside a character list are understood as literals, with the exception of character classes, and the ^ and - metacharacters |
|
|
The following example parse the last name and first from the full name string:
select
regexp_substr('Bond,Jame', '[^,]+', 1, 1) last_name, -- first occurrence of non-comma substring
regexp_substr('Bond,Jame', '[^,]+', 1, 2) first_name -- second occurrence of non-comma substring
from dual;
output:
LAST_NAME FIRST_NAME
Bond Jame
|
The following example examines the string, looking for http:// followed by a substring of one or more alphanumeric characters and optionally, a period (. ). Oracle searches for a minimum of two occurrences of this substring between http:// and either a slash (/ ) or the end of the string.
SELECT
REGEXP_SUBSTR('http://www.example.com/products',
'http://([[:alnum:]]+\.?)+{2,}/?') "Host_Url"
FROM DUAL;
HOST_URL
----------------------
http://www.example.com/
|
Select the second substring of letters that ends with "thy."
SELECT REGEXP_SUBSTR('healthy, wealthy, and wise','\w+thy',1,2);
REGEXP_SUBSTR
---------------
wealthy
|