Parse string in C++

不管從哪個角度看，切割字串都不是件簡單的工作
即便是簡單的字串格式

The simplest Example

先來個C語言處理方式

#include <stdio.h>
int main()
{
	char str[] = "Tom 42";
	int value;
	char name[30];
	sscanf(str, "%s %d", name, &value);
	printf("%s %d\n", name, value);
}

C++的版本

#include <sstream>
#include <iostream>
int main()
{
	std::string str = "Tom 42";
	int value;
	std::string name;
	std::stringstream ss;
	ss << str;
	ss >> name >> value;
	std::cout << name << " " << value << "\n";
}

看起來相差無幾，也沒需要動用到其他武器的地方
不過如果情況更複雜該怎麼辦

More Complex issue

例如輸入的字串是Tom: 42
如果什麼都不改的話，name會得到Tom:，不是我們想要的
如果改C語言的版本Parse string的字串

1	sscanf(str, "%s: %d", name, &value);

結果也不是我們想要的
看看C++的版本，不改的話結果一樣
如果改成這樣

1
2
3

int value;
std::string name, unused;
ss >> name >> unused >> value;

很顯然地也不對

Strtok

如果不動用重型武器，這是我唯一想得出來的方法

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main()
{
	char str[] = "Tom: 42", *p = strtok(str, ":");
	int value;
	char name[30];
	strcpy(name, p);
	p = strtok(NULL, ":");
	value = atoi(p);
	printf("%s %d\n", name, value);
}

雖然解決了問題，不過程式碼支離破碎，維護起來也是麻煩透頂
萬一字串變成 Tom: 42, 123該怎麼辦

Boost Spirit

這世上不缺聰明人，想到了優雅到爆炸的作法
在C++內嵌DSL Parser解決問題，而使用的就是Boost Spirit
不過要說缺點的話，出錯Debug很麻煩

#include <string>
#include <boost/spirit/include/qi.hpp>
namespace qi = boost::spirit::qi;
int main()
{
	std::string str = "Tom: 42";
	typedef std::string::const_iterator It;
	It iter = str.begin(), end = str.end();
	int value;
	std::string name;
	bool r = qi::phrase_parse(iter, end, 
		(*(qi::char_ - ':') >> ":" >> qi::int_), qi::ascii::blank, 
		name, value);

	if (r && iter == end) {
		std::cout << "Parsing succeeded\n";
		std::cout << name << " " << value << "\n";
	}
	else {
		std::cout << "Parsing failed\n";
		std::cout << "stopped at: \"" << std::string(iter, end) << "\"\n";
	}
}

就如同上面說的如果要修改字串成Tom: 42, 123
我們可以將Parser寫成

1
2
3

bool r = qi::phrase_parse(iter, end, 
	(*(qi::char_ - ':') >> ":" >> qi::int_ >> "," >> qi::int_), qi::ascii::blank, 
	name, value, value1);

高明的不得了

Rule & Grammer

當要Parse的字串越來越複雜，就可以自己定義Grammer和Rule了

#include <string>
#include <boost/tuple/tuple.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/include/boost_tuple.hpp>
using MyTuple = boost::tuple<std::string, int, int>;
namespace qi = boost::spirit::qi;
template <typename Iterator, typename Skipper = qi::ascii::blank_type>
struct StringGrammar : qi::grammar <Iterator, MyTuple(), Skipper> {
	StringGrammar() : StringGrammar::base_type(parser, "StringParser Grammar") {
		name = *(qi::char_ - ':');
		value = qi::int_;

		parser = name >> ":" >> value >> "," >> value;
	}
private:
	qi::rule<Iterator, MyTuple(), Skipper> parser;
	// lexemes
	qi::rule<Iterator, std::string(), Skipper> name;
	qi::rule<Iterator, int(), Skipper> value;
};
int main()
{
	std::string str = "Tom: 42, 123";
	typedef std::string::const_iterator It;
	StringGrammar<It> grammer;
	It iter = str.begin(), end = str.end();
	MyTuple v;
	bool r = qi::phrase_parse(iter, end, 
		grammer, qi::ascii::blank,
		v);

	if (r && iter == end) {
		std::cout << "Parsing succeeded\n";
		std::cout << boost::get<0>(v) << " " << boost::get<1>(v) << " " << boost::get<2>(v) << "\n";
	}
	else {
		std::cout << "Parsing failed\n";
		std::cout << "stopped at: \"" << std::string(iter, end) << "\"\n";
	}
}

經過測試，如果省略後面Skipper的模板定義，整個Grammer不會正常運作

Reference

– Home of The Boost.Spirit Library
– Parsing with Spirit Qi
– spirit.Qi in boost