Most of us, Swift developers, are not using and creating regular expressions on day to day basis. But each time we have to, we rely on web search, old documentations. Then we have to deal with unsafety and perform many runs before achieving the expected result. It feels like a heavy rollback of what we are used to when coding with modern language such as Swift.
We can emphasize two issues here:
- in Swift, regexes are strings, which result in no compile-time check, type-safe, and code completion.
- by nature, regexes are mostly write-only things. Unless you make an effort or this is something you use often, this will never be as easy to understand as the rest of your codebase.
This brings SwiftRegexDSL, a Declarative Structured Language for regular expressions in Swift. The idea is to leverage the same "magic" that powers SwiftUI, Result Builder (https://github.com/apple/swift-evolution/blob/main/proposals/0289-result-builders.md) to regex. The DSL provides readable expressions, far more suitable for composition, control flows, in addition to bringing compile-time checks. To summarise, fewer headaches with regex!
struct ThisIsARegex: Regex {
let shouldMatchLine: Bool
var body: Regex {
"Hello"
WhiteSpace()
"World,"
if shouldMatchLine {
Line()
}
AnyCharacter()
.oneOrMore()
}
}
...
let regex = ThisIsARegex(shouldMatchLine: false)
"Hello World, how...".match(regex) // true
SwiftRegex is bundled as a Swift package, you can simply add it from XCode in your iOS or macOS project in File > Swift Packages > Add Package Dependency
and looking for https://github.com/kodlian/SwiftRegexDSL.git
If you are doing things outside Xcode, add it the dependencies section in Package.swift
dependencies: [
dependencies: [
.package(url: "https://github.com/kodlian/SwiftRegexDSL.git", .upToNextMajor(from: "1.0.0"))
]
Like SwiftUI view you declare your regex as a struct preferably in a separate file and use the body to build the regex.
import SwiftRegexDSL
struct MyRegex: Regex {
var body: Regex {
Digit()
.oneOrMore()
Word()
}
}
You can also annotate any variables or functions with @RegexBuilder
:
@RegexBuilder
var digits: Regex {
Digit()
..exactly(10)
}
SwiftRegexDSL implements the most common pattern found in the ICU
API supported by the Swift Foundation.
Most common character classes and special characters are supported such as AnyCharacter
, Digit
, NotDigit
, Word
, Whitespace
. Please refer to CharacterClass.swift
and SpecialCharacters.swift
to see the full coverage.
Of course, you can add any strings in your regex body, either directly or by using the Text
regex.
var body: Regex {
"Title"
Text("-")
.quantified(0..<2)
Digit()
}
Take notice that a String
is not a Regex
component per se, but rather an expression convertible to a Regex. This means If you need to apply a modifier wrap it in a Text
.
You can attach a quantifier using the quantified(...)
modifier or any shortcuts zeroOrMore
, oneOrMore
, zeroOrOne
, exactly
to specify the number of occurrences a pattern should match.
var body: Regex {
Text("-")
.zeroOrOne()
Digit()
.quantified(1..<4)
}
Grouping for readability and for applying a modifier can be added to the body of regex using Group
.
Digit()
Group {
Word()
Digit()
}.zeroOrMore()
...
In addition, the DSL supports:
- A or B pattern using
Alternative
regex - Assertions using
LookAheadAssertion
,NegativeLookAheadAssertion
,... to match but by not advancing the input position - Capturing group
CaptureGroup
for retrieving a range matching a subexpression - Applying pattern option such as
caseInsensitive
using the.options(...)
modifier on any pattern
Set can be defined as Array or using Swift Set using either Character
or range of characters.
Digit()
[`a`,`c`...`z`]
...
Excluding set can be created using ExclusionSet
structure.
Anchor to match a particular area of the input string using either StartAnchor
or EndAnchor
can be added in the body
StartAnchor.line
Digit()
...
The DSL supports pattern by Unicode name, hexadecimal or property using UnsafeUnicode
. Although it is considered to be unsafe as parameters are strings and not bound for hexadecimal.
SwiftRegexDSl is designed to be extensible, you compose your regex using other regexes:
import SwiftRegexDSL
struct DomainRegex: Regex { ... }
struct ExtensionRegex: Regex { ... }
struct HostRegex: Regex {
var body: Regex {
DomainRegex()
"."
ExtensionRegex()
}
}
A regex is defined as a Struct
and the DSL supports control flows, it is easy to define parameters as a type property:
import SwiftRegexDSL
struct TitleRegex: Regex {
let shouldStartWithDigit: Bool
var body: Regex {
if shouldStartWithDigit {
Digit()
.oneOrMore()
}
AnyCharacter()
.oneOrMore()
}
}
If the framework is missing something such as a regex metacharacters, you can use a UsafeRawText
in your regex body as Texts
in the DSL are automatically escaped for safety.
If such a case appears, don't hesitate to contribute to the framework to improve the coverage of the regex standard.
When your regex is ready, the framework offers various extensions on String
:
- check if a string match a regex by using
.match(regex)
- find the range in string matching the regex using
.range(of: regex)
- find ranges of capturing groups
.capturedGroupsRanges(of: regex, with: "foo")
- replace a part of a string using
.replacingOccurrences(of: regex, with: "foo")
NSRegularExpression
can also be created from a Regex
.
It is a young project and many improvements can be done:
- Unicode regex by name or property offers tons of possibilities. Having an enum to describe the most used ones could be a nice addition, offering safety and discoverability.
- Retrieving the ranges of capturing groups is still not very convenient with the framework. Perhaps a system could exist where a binding or callback is directly defined within the body of the Regex.
- Matching digit is easy with Regex, but a number is always a pain. Take for instance IPV4 where each part should not be superior to 255 resulting in quite a long regex hard to read and painful to create, here the way to match 0 to 255:
25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?
Creating a regex matching closed range of numbers will be a time saver. - Anchors are currently a little bit raw and can be anywhere in an expression. There is certainly more safety and convenience to add around them.
- Set are limited to
Character
, which for now for digits we need to use theCharacter
representation of this digit.
Final word, I am not an expert in Regex hence the existence of this framework to ease my pain working with them: so I may have missed and done some mistakes. By making it open source, I hope the swift community contributions will bring it to the next level.