text = """
SELECT
t.PARTY_ID,
FROM ...
SELECT
a.PARTY_ID,
WHERE ...
"""
pattern_regex = r"(?i)SELECT\s+[\s\S]*?\.PARTY_ID\s*,"
replacement = "select aa,"
SELECT
t.PARTY_ID,
FROM ...
select aa
WHERE ...
import re
def replace_last_occurrence(text, pattern_regex, replacement):
# 查找所有匹配正则表达式的项
matches = list(re.finditer(pattern_regex, text))
if not matches:
return text # 没有匹配项,返回原文本
last_match = matches[-1] # 最后一个匹配项
# 替换操作
return text[:last_match.start()] + replacement + text[last_match.end():]
sql_text = """
SELECT
t.PARTY_ID,
FROM customer t
WHERE ...
SELECT
a.PARTY_ID,
ORDER BY a.PARTY_ID;
"""
# 匹配以 SELECT 开头,包含 .PARTY_ID 并以逗号结尾的子句块(跨行也支持)
pattern_regex = r"(?i)SELECT\s+[\s\S]*?\.PARTY_ID\s*,"
# 匹配以 SELECT 开头,包含 PARTY_ID,并且PARTY_ID可能有别名的情况
pattern_regex = r"(?i)SELECT\s+[\s\S]*?\w{1,3}\.?PARTY_ID\s*,"
replacement = "select aa,"
result = replace_last_occurrence(sql_text, pattern_regex, replacement)
print(result)
(?i):忽略大小写
SELECT\s+:匹配 SELECT 后面至少一个空白字符
[\s\S]*?:非贪婪地匹配任意字符(包括换行),用于跨行匹配
\.PARTY_ID:匹配 .PARTY_ID
\s*,:匹配可选空格加逗号
如果你希望这个函数适用于多种字段(不只是 PARTY_ID),可以将字段名参数化:
def replace_last_select_field(text, field_name, replacement):
pattern_regex = fr"(?i)SELECT\s+[\s\S]*?\.{field_name}\s*,"
return replace_last_occurrence(text, pattern_regex, replacement)
replace_last_select_field(sql_text, "PARTY_ID", "select aa,")
|
text = """
SELECT
t.amt,
FROM ...
SELECT
a.PARTY_ID,
WHERE ...
"""
如果是这样的格式,那么示例1会直接将第1个SELECT与最后一个PARTY_ID作为匹配项
毕竟,它符合以SELECT开关,以PARTY_ID结尾
pattern_regex = r"(?i)SELECT\s+[\s\S]{1,4}?\w{1,4}\.?PARTY_ID\s*,"
\S 任意非空字符,任意个连续的非空字符都能匹配,中间有空格就无法匹配
{1,4}这里限定1-4个字符,实际情况是只能有空字符不能有非空字符的,
但1-4这限定也除非前面第1个SELECT被误选的情况
实际上去掉\S应该更符合本意,就是只有空白
上面的正则匹配出问题了,有段文本没有匹配到最后一个select,反而匹配到中间的一个select
于是就去掉了\S,下面的模式可以正常使用
pattern_regex = r"(?i)SELECT\s+\w{1,4}\.?PARTY_ID\s*,"
这也说明正则匹配应该精准,并且是精准的模式越好
|