Click here to Skip to main content
15,889,992 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Hello.
I have a simple string which looks like this:
C#
string test = "token1 token2 token3 complex token";

I need to split this string into tokens with regex. Token names are always the same, but can be lets say: token4, token5, complex token2, etc.

What I have tried:

What I have tried to do is the following:
C#
var result = Regex.Matches(test, "(token1) (token2) (complex token)").Cast<Match>().Select(m => m.Value).ToList()

This does not work. List is either empty, or contain one string which is a collection of all tokens, like this: token1 token2 complex token. But I need it to contain for this case, 3 strings, each one will be the token.
Can someone help please?
Regards
Posted
Updated 6-Feb-20 3:34am

Your regex will only match the literal string "token 1 token 2 complex token". You'll only get multiple matches if your input string contains that literal string multiple times.

Assuming you just want to extract the individual tokens, try:
C#
var result = Regex.Matches(test, @"(token1)|(token2)|(complex token)").Cast<Match>().Select(m => m.Value).ToList();
Regular Expression Language - Quick Reference | Microsoft Docs[^]
 
Share this answer
 
Comments
Pete O'Hanlon 6-Feb-20 9:40am    
And that's the way to solve it. My 5.
Maciej Los 6-Feb-20 9:55am    
5ed!
you need a tokenizer i think.

At the risk of plugging my own offerings try this: Rolex (Reboot): Unicode Enabled Lexer Generator in C#[^]

// rolex lex spec
complexToken = 'complex token[0-9]+'
token = '[A-Za-z][A-Za-z0-9]*'
int = '\-?[0-9]*' // just an example
ws<hidden> = '[ \t]+' // hide whitespace


Then you can tokenize by doing this

C#
foreach(var token in new MyTokenizer("my test string"))
   Console.WriteLine("{0}: {1} at {2}",tok.SymbolId,tok.Value,tok.Position)


It creates a single file with no dependendecies. Input is IEnumerable<char>, "output" is IEnumerable<Token>

if you need that over a file, use this IO: A Small Streaming I/O and UTF-32 Library[^]
 
Share this answer
 
v7

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900