0%

Parse string in C++

不管從哪個角度看,切割字串都不是件簡單的工作
即便是簡單的字串格式

The simplest Example

先來個C語言處理方式

1
2
3
4
5
6
7
8
9
#include <stdio.h>
int main()
{
char str[] = "Tom 42";
int value;
char name[30];
sscanf(str, "%s %d", name, &value);
printf("%s %d\n", name, value);
}

C++的版本

1
2
3
4
5
6
7
8
9
10
11
12
#include <sstream>
#include <iostream>
int main()
{
std::string str = "Tom 42";
int value;
std::string name;
std::stringstream ss;
ss << str;
ss >> name >> value;
std::cout << name << " " << value << "\n";
}

看起來相差無幾,也沒需要動用到其他武器的地方
不過如果情況更複雜該怎麼辦

More Complex issue

例如輸入的字串是Tom: 42
如果什麼都不改的話,name會得到Tom:,不是我們想要的
如果改C語言的版本Parse string的字串

1
sscanf(str, "%s: %d", name, &value);

結果也不是我們想要的
看看C++的版本,不改的話結果一樣
如果改成這樣

1
2
3
int value;
std::string name, unused;
ss >> name >> unused >> value;

很顯然地也不對

Strtok

如果不動用重型武器,這是我唯一想得出來的方法

1
2
3
4
5
6
7
8
9
10
11
12
13
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main()
{
char str[] = "Tom: 42", *p = strtok(str, ":");
int value;
char name[30];
strcpy(name, p);
p = strtok(NULL, ":");
value = atoi(p);
printf("%s %d\n", name, value);
}

雖然解決了問題,不過程式碼支離破碎,維護起來也是麻煩透頂
萬一字串變成 Tom: 42, 123該怎麼辦

Boost Spirit

這世上不缺聰明人,想到了優雅到爆炸的作法
在C++內嵌DSL Parser解決問題,而使用的就是Boost Spirit
不過要說缺點的話,出錯Debug很麻煩

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
#include <string>
#include <boost/spirit/include/qi.hpp>
namespace qi = boost::spirit::qi;
int main()
{
std::string str = "Tom: 42";
typedef std::string::const_iterator It;
It iter = str.begin(), end = str.end();
int value;
std::string name;
bool r = qi::phrase_parse(iter, end,
(*(qi::char_ - ':') >> ":" >> qi::int_), qi::ascii::blank,
name, value);

if (r && iter == end) {
std::cout << "Parsing succeeded\n";
std::cout << name << " " << value << "\n";
}
else {
std::cout << "Parsing failed\n";
std::cout << "stopped at: \"" << std::string(iter, end) << "\"\n";
}
}

就如同上面說的如果要修改字串成Tom: 42, 123
我們可以將Parser寫成

1
2
3
bool r = qi::phrase_parse(iter, end, 
(*(qi::char_ - ':') >> ":" >> qi::int_ >> "," >> qi::int_), qi::ascii::blank,
name, value, value1);

高明的不得了

Rule & Grammer

當要Parse的字串越來越複雜,就可以自己定義Grammer和Rule了

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
#include <string>
#include <boost/tuple/tuple.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/include/boost_tuple.hpp>
using MyTuple = boost::tuple<std::string, int, int>;
namespace qi = boost::spirit::qi;
template <typename Iterator, typename Skipper = qi::ascii::blank_type>
struct StringGrammar : qi::grammar <Iterator, MyTuple(), Skipper> {
StringGrammar() : StringGrammar::base_type(parser, "StringParser Grammar") {
name = *(qi::char_ - ':');
value = qi::int_;

parser = name >> ":" >> value >> "," >> value;
}
private:
qi::rule<Iterator, MyTuple(), Skipper> parser;
// lexemes
qi::rule<Iterator, std::string(), Skipper> name;
qi::rule<Iterator, int(), Skipper> value;
};
int main()
{
std::string str = "Tom: 42, 123";
typedef std::string::const_iterator It;
StringGrammar<It> grammer;
It iter = str.begin(), end = str.end();
MyTuple v;
bool r = qi::phrase_parse(iter, end,
grammer, qi::ascii::blank,
v);

if (r && iter == end) {
std::cout << "Parsing succeeded\n";
std::cout << boost::get<0>(v) << " " << boost::get<1>(v) << " " << boost::get<2>(v) << "\n";
}
else {
std::cout << "Parsing failed\n";
std::cout << "stopped at: \"" << std::string(iter, end) << "\"\n";
}
}

經過測試,如果省略後面Skipper的模板定義,整個Grammer不會正常運作

Reference

Home of The Boost.Spirit Library
Parsing with Spirit Qi
spirit.Qi in boost