正则表达式中是否支持使用c#的平衡组


PowerShell交流中心正则表达式中是否支持使用c#的平衡组
0
DJ asked 6 年 ago

我需要匹配htm文件中的<dl>标签,但是它有嵌套的情况,我想把所有<dl>与</dl>对应的都匹配到,我尝试了使用c#平衡组:

    $patternNobullet = [regex]"<dl[^>]*>[\s\S]*(((?'Open'<dl[^>]*>)[\s\S]*)+((?'-Open'</dl>)[\s\S]*)+)*(?(Open)(?!))</dl>"

    [System.Collections.ArrayList]$tablecontents = @()

    $match = $patternNobullet.Match($content)
    while ($match.Success) {
        $tablecontents.Add($match.Value) | out-null
        $match = $match.NextMatch()
    }

上面的这个能匹配到最外层的<dl>标签,但是只是最外层的,里面嵌套的它就匹配不出来了。

之后我又尝试使用下面的这种方法,

$patternNobullet = [regex]"<dl[^>]*>[\s\S]*(((?'Open'<dl[^>]*>)[\s\S]*)+((?'-Open'</dl>)[\s\S]*)+)*(?(Open)(?!))</dl>"

$content | Select-String $patternNobullet -AllMatches | ForEach-Object{
  foreach($v in $_.Matches)
  {...}
}

但是这种方法,好像不支持平衡组,什么都不能匹配。各位大神有什么好的方法吗,能把说有对应的标签中的内容都提取出来吗?

这是htm中的内容:

<dl>
<dd>[Scenario or Feature Name] (Entry Page)<dl>
<dd>Why [Do Scenario or Use Feature]? </dd>
<dd>What’s New for [Scenario or Feature] in [Product] [Version#]? </dd>
<dd>Getting Started with [Scenario or Feature]<dl>
<dd>Learning Path for [Scenario or Feature]</dd>
<dd>Prepare Your Development Environment for [Scenario or Feature]</dd>
<dd>Tutorial: Create your First [Scenario or Feature Application]</dd>
<dd>Community Resources for [Scenario or Feature]</dd>
</dl>
</dd>
<dd>How to [Complete Scenario or Use Feature]<dl>
<dd>Best Practices for [Scenario or Feature]</dd>
<dd>How to [Complete a Dev Scenario] (Scenario Portal)<dl>
<dd>Best Practices for [Scenario]</dd>
<dd>Design Considerations for [Scenario]</dd>
<dd>How to: [Complete Task 1 in Scenario]</dd>
<dd>How to: [Complete Task 2 in Scenario]</dd>
<dd>How to: [Complete Task N in Scenario]</dd>
<dd>Testing Your [Scenario]</dd>
<dd>Troubleshooting Your [Scenario]</dd>
</dl>
</dd>
<dd>How to: [Complete Some Task]</dd>
<dd>How to: [Complete Some Task]</dd>
</dl>
</dd>
<dd>[Scenario or Feature] Concepts<dl>
<dd>[Scenario or Feature] Overview</dd>
<dd>[Scenario or Feature] Architecture</dd>
<dd>&lt;other conceptual topics&gt;</dd>
</dl>
</dd>
<dd>[Scenario or Feature] Reference<dl>
<dd>&lt;standard reference topics or an index to WinRT reference topics&gt;</dd>
</dl>
</dd>
<dd>[Scenario or Feature] Tools</dd>
<dd>[Scenario or Feature] Samples</dd>
</dl>
</dd>
</dl>

1 Answers
0
Mooser Lee 管理员 answered 6 年 ago

这么复杂的正则表达式,不太愿意去看。解析HTML 文档,请用专门的工具,HTMLAgilityPack