不管從哪個角度看,切割字串都不是件簡單的工作
即便是簡單的字串格式
The simplest Example
先來個C語言處理方式
1 2 3 4 5 6 7 8 9
| #include <stdio.h> int main() { char str[] = "Tom 42"; int value; char name[30]; sscanf(str, "%s %d", name, &value); printf("%s %d\n", name, value); }
|
C++的版本
1 2 3 4 5 6 7 8 9 10 11 12
| #include <sstream> #include <iostream> int main() { std::string str = "Tom 42"; int value; std::string name; std::stringstream ss; ss << str; ss >> name >> value; std::cout << name << " " << value << "\n"; }
|
看起來相差無幾,也沒需要動用到其他武器的地方
不過如果情況更複雜該怎麼辦
More Complex issue
例如輸入的字串是Tom: 42
如果什麼都不改的話,name會得到Tom:
,不是我們想要的
如果改C語言的版本Parse string的字串
1
| sscanf(str, "%s: %d", name, &value);
|
結果也不是我們想要的
看看C++的版本,不改的話結果一樣
如果改成這樣
1 2 3
| int value; std::string name, unused; ss >> name >> unused >> value;
|
很顯然地也不對
Strtok
如果不動用重型武器,這是我唯一想得出來的方法
1 2 3 4 5 6 7 8 9 10 11 12 13
| #include <stdio.h> #include <string.h> #include <stdlib.h> int main() { char str[] = "Tom: 42", *p = strtok(str, ":"); int value; char name[30]; strcpy(name, p); p = strtok(NULL, ":"); value = atoi(p); printf("%s %d\n", name, value); }
|
雖然解決了問題,不過程式碼支離破碎,維護起來也是麻煩透頂
萬一字串變成 Tom: 42, 123
該怎麼辦
Boost Spirit
這世上不缺聰明人,想到了優雅到爆炸的作法
在C++內嵌DSL Parser解決問題,而使用的就是Boost Spirit
不過要說缺點的話,出錯Debug很麻煩
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
| #include <string> #include <boost/spirit/include/qi.hpp> namespace qi = boost::spirit::qi; int main() { std::string str = "Tom: 42"; typedef std::string::const_iterator It; It iter = str.begin(), end = str.end(); int value; std::string name; bool r = qi::phrase_parse(iter, end, (*(qi::char_ - ':') >> ":" >> qi::int_), qi::ascii::blank, name, value);
if (r && iter == end) { std::cout << "Parsing succeeded\n"; std::cout << name << " " << value << "\n"; } else { std::cout << "Parsing failed\n"; std::cout << "stopped at: \"" << std::string(iter, end) << "\"\n"; } }
|
就如同上面說的如果要修改字串成Tom: 42, 123
我們可以將Parser寫成
1 2 3
| bool r = qi::phrase_parse(iter, end, (*(qi::char_ - ':') >> ":" >> qi::int_ >> "," >> qi::int_), qi::ascii::blank, name, value, value1);
|
高明的不得了
Rule & Grammer
當要Parse的字串越來越複雜,就可以自己定義Grammer和Rule了
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
| #include <string> #include <boost/tuple/tuple.hpp> #include <boost/spirit/include/qi.hpp> #include <boost/fusion/include/boost_tuple.hpp> using MyTuple = boost::tuple<std::string, int, int>; namespace qi = boost::spirit::qi; template <typename Iterator, typename Skipper = qi::ascii::blank_type> struct StringGrammar : qi::grammar <Iterator, MyTuple(), Skipper> { StringGrammar() : StringGrammar::base_type(parser, "StringParser Grammar") { name = *(qi::char_ - ':'); value = qi::int_;
parser = name >> ":" >> value >> "," >> value; } private: qi::rule<Iterator, MyTuple(), Skipper> parser; qi::rule<Iterator, std::string(), Skipper> name; qi::rule<Iterator, int(), Skipper> value; }; int main() { std::string str = "Tom: 42, 123"; typedef std::string::const_iterator It; StringGrammar<It> grammer; It iter = str.begin(), end = str.end(); MyTuple v; bool r = qi::phrase_parse(iter, end, grammer, qi::ascii::blank, v);
if (r && iter == end) { std::cout << "Parsing succeeded\n"; std::cout << boost::get<0>(v) << " " << boost::get<1>(v) << " " << boost::get<2>(v) << "\n"; } else { std::cout << "Parsing failed\n"; std::cout << "stopped at: \"" << std::string(iter, end) << "\"\n"; } }
|
經過測試,如果省略後面Skipper
的模板定義,整個Grammer不會正常運作
Reference
– Home of The Boost.Spirit Library
– Parsing with Spirit Qi
– spirit.Qi in boost