总体介绍
ofsoftswitch13
是支持Openflow1.3的开源SDN软件交换机,由爱立信公司的CPqD团队开发。工程以Stanford大学开发的openflow1.0软件交换机为基础,主要新增了对Openflow1.3协议的支持。
ofsoftswitch13
是一款由C语言实现的轻量级的SDN软件交换机,代码总量在5-10
万行之间。在功能上比较完整的对Openflow1.3协议进行了实现, 最重要的两个部分是udatapath和secchan。其中udatapath是交换机中数据包的处理通道,secchan负责交换机与控制器的连接。
本文主要对udatapath部分的内容进行解析,具体来说将从三个方面展开:一是交换机udatapath中的核心结构,二是交换机对来自控制平面消息的接收和处理流量;三是交换机对来自数据平面的数据包的处理流程。
本文目录如下:
核心数据结构
基础结构
Struct datapath
软件交换机中最基本,也是最重要的数据结构。一个datapath
就相当于一台Openflow交换机实体,记录着着交换机的描述信息、端口数量、流表结构、组表结构、与远程控制器的连接信息等。
1 | struct datapath { |
补充一句,与datapath有关的模块包括:
- dp_ports
- flow entry, flow table
- group entry, group table
- meter table
Struct ofpbuf
ofpbuf
用于表示从入端口收到的数据包在交换机中存储的结构,位于lib
目录。这是数据包在进入交换机datapath之后的初始形态。1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19/* Buffer for holding arbitrary data. An ofpbuf is automatically reallocated
* as necessary if it grows too large for the available memory. */
struct ofpbuf {
void *base; /* First byte of area malloc()'d area. */
size_t allocated; /* Number of bytes allocated. */
uint8_t conn_id; /* Connection ID. Application-defined value to
associate a connection to the buffer. */
void *data; /* First byte actually in use. */
size_t size; /* Number of bytes in use. */
void *l2; /* Link-level header. */
void *l3; /* Network-level header. */
void *l4; /* Transport-level header. */
void *l7; /* Application data. */
struct ofpbuf *next; /* Next in a list of ofpbufs. */
void *private_p; /* Private pointer for use by owner. */
};
控制平面
Struct sender
结构体sender用来表示交换机中与控制器进行的一次会话,相当于一次连接的标识,交换机可以通过sender直接对控制器发过来的消息进行回应。定义于文件udatapath/daatapath.h
中。1
2
3
4
5
6/* The origin of a received OpenFlow message, to enable sending a reply. */
struct sender {
struct remote *remote; /* The device that sent the message. */
uint8_t conn_id; /* The connection that sent the message */
uint32_t xid; /* The OpenFlow transaction ID. */
};
Struct remote
结构remote表示与安全通道的一个连接,即交换机与控制器的一个连接。定义于文件udatapath/daatapath.h
中。
1 | /* A connection to a secure channel. */ |
Struct rconn
结构体rconn表示控制器与交换机之间的一次可靠连接,定义于文件lib/rconn.c
文件中。来自源码注释中的解释是: A wrapper around vconn that provides queuing and optionally reliability. An rconn maintains a message transmission queue of bounded length specified by the caller. The rconn does not guarantee reliable delivery of queued messages: all queued messages are dropped when reconnection becomes necessary. An rconn optionally provides reliable communication, in this sense: the rconn will re-connect, with exponential backoff, when the underlying vconn disconnects.1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69/* A reliable connection to an OpenFlow switch or controller.
*
* See the large comment in rconn.h for more information. */
struct rconn {
enum state state;
time_t state_entered;
struct vconn *vconn;
char *name;
bool reliable;
struct ofp_queue txq;
int backoff;
int max_backoff;
time_t backoff_deadline;
time_t last_received;
time_t last_connected;
unsigned int packets_sent;
unsigned int seqno;
/* In S_ACTIVE and S_IDLE, probably_admitted reports whether we believe
* that the peer has made a (positive) admission control decision on our
* connection. If we have not yet been (probably) admitted, then the
* connection does not reset the timer used for deciding whether the switch
* should go into fail-open mode.
*
* last_admitted reports the last time we believe such a positive admission
* control decision was made. */
bool probably_admitted;
time_t last_admitted;
/* These values are simply for statistics reporting, not used directly by
* anything internal to the rconn (or the secchan for that matter). */
unsigned int packets_received;
unsigned int n_attempted_connections, n_successful_connections;
time_t creation_time;
unsigned long int total_time_connected;
/* If we can't connect to the peer, it could be for any number of reasons.
* Usually, one would assume it is because the peer is not running or
* because the network is partitioned. But it could also be because the
* network topology has changed, in which case the upper layer will need to
* reassess it (in particular, obtain a new IP address via DHCP and find
* the new location of the controller). We set this flag when we suspect
* that this could be the case. */
bool questionable_connectivity;
time_t last_questioned;
/* Throughout this file, "probe" is shorthand for "inactivity probe".
* When nothing has been received from the peer for a while, we send out
* an echo request as an inactivity probe packet. We should receive back
* a response. */
int probe_interval; /* Secs of inactivity before sending probe. */
/* Messages sent or received are copied to the monitor connections. */
#define MAX_MONITORS 8
struct vconn *monitors[8];
size_t n_monitors;
/* Protocol statistical informaition. */
/* TODO Zoltan: Temporarily removed when moving to OpenFlow 1.1 */
/*
struct ofpstat ofps_rcvd;
struct ofpstat ofps_sent;
*/
uint32_t idle_echo_xid;
};
Struct vconn
定义于文件lib/vconn-provider.h
中。Active virtual connection to an OpenFlow device. This structure should be treated as opaque by vconn implementations.1
2
3
4
5
6
7
8
9
10
11
12struct vconn {
struct vconn_class *class;
int state;
int error;
int min_version;
int version;
uint32_t ip;
char *name;
bool reconnectable;
struct ofpstat ofps_rcvd;
struct ofpstat ofps_sent;
};
Struct pvconn
定义于文件lib/vconn-provider.h
中。Passive virtual connection to an OpenFlow device. This structure should be treated as opaque by vconn implementations.1
2
3
4struct pvconn {
struct pvconn_class *class;
char *name;
};
数据平面
Struct packet
来自数据平面的网络数据包在软件交换机中的结构,以及相关的处理状态。定义于udatapath/packet.h
文件中。1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17struct packet {
struct datapath *dp;
struct ofpbuf *buffer; /* buffer containing the packet */
uint32_t in_port;
struct action_set *action_set; /* action set associated with the packet */
bool packet_out; /* true if the packet arrived in a packet out msg */
uint32_t out_group; /* OFPG_ANY = no out group */
uint32_t out_port; /* OFPP_ANY = no out port */
uint16_t out_port_max_len; /* max length to send, if out_port is OFPP_CONTROLLER */
uint32_t out_queue;
uint8_t table_id; /* table in which is processed */
uint32_t buffer_id; /* if packet is stored in buffer, buffer_id;
otherwise 0xffffffff */
struct packet_handle_std *handle_std; /* handler for standard match structure */
};
Struct pipeline
pipeline
结构是交换机中对数据包处理的流水线,实现数据包与流表的匹配操作,并根据匹配流表项的指令对数据包进相应的操作。1
2
3
4
5/* A pipeline structure */
struct pipeline {
struct datapath *dp;
struct flow_table *tables[PIPELINE_TABLES];
};
OFS控制平面消息的处理流程
(1)建立datapath
文件udatapath/udatapath.c
是交换机中处理Openflow消息的入口文件,主要负责根据用户输出的命令创建相应的datapath,创建主要代码入下所示:1
2
3
4
5
6static struct datapath *dp;
int udatapath_cmd(int argc, char *argv[]) {
...
dp = dp_new(); /*创建datapath*/
...
}
(2)监听openflow信道
接下来开始监听openflow信道,代码如下所示。其中,函数pvconn_open()
实现对参数指定的openlow连接开始监听,其参数pvconn
是被动虚拟连接的名字,格式为"TYPE:ARGS"
,参数的值一般为ptcp:6632
。 如果监听连接成功,则调用函数dp_add_pvconn()
将该被动虚拟连接加入到datapath的监听列表中。1
2
3
4
5
6
7
8
9
10
11
12
13
14int udatapath_cmd(int argc, char *argv[])
{
...
const char *pvconn_name = argv[i];
struct pvconn *pvconn, *pvconn_aux = NULL;
...
/* Attempts to start listening for OpenFlow connections. 'name' is a
* connection name in the form "TYPE:ARGS", where TYPE is an passive vconn
* class's name and ARGS are vconn class-specific.*/
retval = pvconn_open(pvconn_name, &pvconn);
...
dp_add_pvconn(dp, pvconn, pvconn_aux);
...
)
datapath
建立与控制器的连接后,主函数会将创建好datapath交给循环调用的dp_run()
函数来处理与该datapath有关的消息。1
2
3
4
5for (;;) {
dp_run(dp);
dp_wait(dp);
poll_block();
}
(3)接收openflow消息
函数dp_run()
在文件udatapath/datapath.c
中实现。具体的,函数dp_run()
调用remote_run()
来接收到达交换机的openflow控制消息,对应代码如下:1
2
3
4
5
6
7
8
9
10void dp_run(struct datapath *dp) {
struct remote *r, *rn;
...
dp_ports_run(dp);
/* Talk to remotes. */
LIST_FOR_EACH_SAFE (r, rn, struct remote, node, &dp->remotes) {
remote_run(dp, r);
}
...
}
函数remote_run()
会判断连接的类型,并通过调用remote_rconn_run()
来接收消息。因此函数remote_rconn_run()
是openflow消息接收的直接控制者,位于udatapath/datapath.c
文件中:1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47static void
remote_run(struct datapath *dp, struct remote *r)
{
remote_rconn_run(dp, r, MAIN_CONNECTION);
if (!rconn_is_alive(r->rconn)) {
remote_destroy(r);
return;
}
if (r->rconn_aux == NULL || !rconn_is_alive(r->rconn_aux))
return;
remote_rconn_run(dp, r, PTIN_CONNECTION);
}
static void
remote_rconn_run(struct datapath *dp, struct remote *r, uint8_t conn_id) {
struct rconn *rconn = NULL;
rconn_run(rconn);
...
for (i = 0; i < 50; i++) {
struct ofpbuf *buffer;
buffer = rconn_recv(rconn);
struct ofl_msg_header *msg;
struct sender sender = {.remote = r, .conn_id = conn_id};
error = ofl_msg_unpack(buffer->data, buffer->size, &msg, &(sender.xid), dp->exp);
if (!error) {
error = handle_control_msg(dp, msg, &sender);
if (error) ofl_msg_free(msg, dp->exp);
}
if (error) {
struct ofl_msg_error err =
{{.type = OFPT_ERROR},
.type = ofl_error_type(error),
.code = ofl_error_code(error),
.data_length = buffer->size,
.data = buffer->data};
dp_send_message(dp, (struct ofl_msg_header *)&err, &sender);
}
ofpbuf_delete(buffer);
...
}
}
函数中没次调用尝试接收消息数设置为50,这是为了防止占用过多CPU资源,其他进程不会饿死。收到一个正常的数据包buffer后,就会由openflow消息解析函数ofl_msg_unpack()
对数据包内容进行解析。
(4)解析openflow消息
对数据包内容进行解析的函数ofl_msg_unpack()
定义于文件oflib/ofl-messages-unpack.c
中。函数ofl_msg_unpack()
首先会解析出openflow消息的类型,然后根据不同的类型调用相关解析函数,主要代码如下所示(仅列出了部分消息类型):1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77ofl_err
ofl_msg_unpack(uint8_t *buf, size_t buf_len, struct ofl_msg_header **msg, uint32_t *xid, struct ofl_exp *exp) {
struct ofp_header *oh;
oh = (struct ofp_header *)buf;
switch (oh->type) {
case OFPT_HELLO:
error = ofl_msg_unpack_empty(oh, &len, msg);
break;
...
case OFPT_EXPERIMENTER:
if (exp == NULL || exp->msg == NULL || exp->msg->unpack == NULL) {
OFL_LOG_WARN(LOG_MODULE, "Received EXPERIMENTER message, but no callback was given.");
error = ofl_error(OFPET_BAD_REQUEST, OFPBRC_BAD_EXPERIMENTER);
} else {
error = exp->msg->unpack(oh, &len, (struct ofl_msg_experimenter **)msg);
}
break;
/* Switch configuration messages. */
case OFPT_FEATURES_REQUEST:
error = ofl_msg_unpack_empty(oh, &len, msg);
break;
/* Asynchronous messages. */
case OFPT_PACKET_IN:
error = ofl_msg_unpack_packet_in(oh,buf, &len, msg);
break;
/* Controller command messages. */
case OFPT_GET_ASYNC_REQUEST:
error = ofl_msg_unpack_empty(oh, &len, msg);
break;
case OFPT_PACKET_OUT:
error = ofl_msg_unpack_packet_out(oh, &len, msg, exp);
break;
case OFPT_FLOW_MOD:
error = ofl_msg_unpack_flow_mod(oh,buf, &len, msg, exp);
break;
...
/* Statistics messages. */
case OFPT_MULTIPART_REQUEST:
error = ofl_msg_unpack_multipart_request(oh,buf, &len, msg, exp);
break;
case OFPT_MULTIPART_REPLY:
error = ofl_msg_unpack_multipart_reply(oh,buf, &len, msg, exp);
break;
/* Barrier messages. */
case OFPT_BARRIER_REQUEST:
case OFPT_BARRIER_REPLY:
error = ofl_msg_unpack_empty(oh, &len, msg);
break;
/* Role messages. */
case OFPT_ROLE_REQUEST:
case OFPT_ROLE_REPLY:
error = ofl_msg_unpack_role_request(oh, &len, msg);
break;
/* Queue Configuration messages. */
case OFPT_QUEUE_GET_CONFIG_REQUEST:
error = ofl_msg_unpack_queue_get_config_request(oh, &len, msg);
break;
case OFPT_QUEUE_GET_CONFIG_REPLY:
error = ofl_msg_unpack_queue_get_config_reply(oh, &len, msg);
break;
case OFPT_METER_MOD:
error = ofl_msg_unpack_meter_mod(oh, &len, msg);
break;
default: {
error = ofl_error(OFPET_BAD_REQUEST, OFPGMFC_BAD_TYPE);
}
}
}
从ofl_msg_unpack()
函数对openflow子类型消息解析函数的调用中,我们可以发现experimenter消息的处理是最特殊的。openflow协议中已经定义好的消息均与ofl_msg_unpack()
处于同一个模块ofl_msg_u
中,而experimenter
消息内容可以由厂商自定义,因此为了保持了对experimenter
消息良好的可扩展性,工程中通过函数指针的方式实现类似于C++中多台的功能。这给我们实现自定义的消息类型提供了非常方便的接口。关于实验消息的实现我会在另一篇中详细讲解。
(5)处理openflow消息
remote_rconn_run
调用消息解析函数之后,如果解析无误,则继续调用handle_control_msg()
函数对消息内容进行处理。与ofl_msg_unpack()
类似,函数handle_control_msg()
定义于文件udtapath/dp_control.c
中,会根据消息的不同类型分配给不同的处理函数,主要代码入下。1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107/* Dispatches control messages to appropriate handler functions. */
ofl_err
handle_control_msg(struct datapath *dp, struct ofl_msg_header *msg,
const struct sender *sender) {
if (VLOG_IS_DBG_ENABLED(LOG_MODULE)) {
char *msg_str = ofl_msg_to_string(msg, dp->exp);
VLOG_DBG_RL(LOG_MODULE, &rl, "received control msg: %.400s", msg_str);
free(msg_str);
}
switch (msg->type) {
case OFPT_HELLO: {
ofl_msg_free(msg, dp->exp);
return 0;
}
case OFPT_ERROR: {
return ofl_error(OFPET_BAD_REQUEST, OFPBRC_BAD_TYPE);
}
case OFPT_BARRIER_REQUEST: {
return handle_control_barrier_request(dp, msg, sender);
}
case OFPT_BARRIER_REPLY: {
ofl_msg_free(msg, dp->exp);
return 0;
}
case OFPT_FEATURES_REQUEST: {
return handle_control_features_request(dp, msg, sender);
}
case OFPT_FEATURES_REPLY: {
return ofl_error(OFPET_BAD_REQUEST, OFPBRC_BAD_TYPE);
}
case OFPT_GET_CONFIG_REQUEST: {
return handle_control_get_config_request(dp, msg, sender);
}
case OFPT_GET_CONFIG_REPLY: {
return ofl_error(OFPET_BAD_REQUEST, OFPBRC_BAD_TYPE);
}
case OFPT_SET_CONFIG: {
return handle_control_set_config(dp, (struct ofl_msg_set_config *)msg, sender);
}
case OFPT_PACKET_IN: {
return ofl_error(OFPET_BAD_REQUEST, OFPBRC_BAD_TYPE);
}
case OFPT_PACKET_OUT: {
return handle_control_packet_out(dp, (struct ofl_msg_packet_out *)msg, sender);
break;
}
case OFPT_FLOW_REMOVED: {
return ofl_error(OFPET_BAD_REQUEST, OFPBRC_BAD_TYPE);
}
case OFPT_PORT_STATUS: {
return ofl_error(OFPET_BAD_REQUEST, OFPBRC_BAD_TYPE);
}
case OFPT_FLOW_MOD: {
return pipeline_handle_flow_mod(dp->pipeline, (struct ofl_msg_flow_mod *)msg, sender);
}
case OFPT_GROUP_MOD: {
return group_table_handle_group_mod(dp->groups, (struct ofl_msg_group_mod *)msg, sender);
}
case OFPT_PORT_MOD: {
return dp_ports_handle_port_mod(dp, (struct ofl_msg_port_mod *)msg, sender);
}
case OFPT_TABLE_MOD: {
return pipeline_handle_table_mod(dp->pipeline, (struct ofl_msg_table_mod *)msg, sender);
}
case OFPT_MULTIPART_REQUEST: {
return handle_control_stats_request(dp, (struct ofl_msg_multipart_request_header *)msg, sender);
}
case OFPT_MULTIPART_REPLY: {
return ofl_error(OFPET_BAD_REQUEST, OFPBRC_BAD_TYPE);
}
case OFPT_ECHO_REQUEST: {
return handle_control_echo_request(dp, (struct ofl_msg_echo *)msg, sender);
}
case OFPT_ECHO_REPLY: {
return handle_control_echo_reply(dp, (struct ofl_msg_echo *)msg, sender);
}
case OFPT_QUEUE_GET_CONFIG_REQUEST: {
return dp_ports_handle_queue_get_config_request(dp, (struct ofl_msg_queue_get_config_request *)msg, sender);
}
case OFPT_ROLE_REQUEST: {
return dp_handle_role_request(dp, (struct ofl_msg_role_request*)msg, sender);
}
case OFPT_ROLE_REPLY:{
return ofl_error(OFPET_BAD_REQUEST, OFPBRC_BAD_TYPE);
}
case OFPT_QUEUE_GET_CONFIG_REPLY: {
return ofl_error(OFPET_BAD_REQUEST, OFPBRC_BAD_TYPE);
}
case OFPT_METER_MOD:{
return meter_table_handle_meter_mod(dp->meters, (struct ofl_msg_meter_mod *)msg, sender);
}
case OFPT_EXPERIMENTER: {
return dp_exp_message(dp, (struct ofl_msg_experimenter *)msg, sender);
}
case OFPT_GET_ASYNC_REPLY:{
return ofl_error(OFPET_BAD_REQUEST, OFPBRC_BAD_TYPE);
}
case OFPT_GET_ASYNC_REQUEST:
case OFPT_SET_ASYNC:{
return dp_handle_async_request(dp, (struct ofl_msg_async_config*)msg, sender);
}
default: {
return ofl_error(OFPET_BAD_REQUEST, OFPBRC_BAD_TYPE);
}
}
}
其中需要注意的问题是代码内存块的管理问题:It is assumed that if a handler returns with error, it did not use any part of the control message, thus it can be freed up. If no error is returned however, the message must be freed inside the handler (because the handler might keep parts of the message).
OFS数据平面消息的处理流程
(1)建立datapath
文件udatapath/udatapath.c
也是交换机中对数据包转发处理的入口文件。根据用户输出的命令创建相应的datapath之后,主函数会将创建好datapath交给循环调用的dp_run()
函数来处理通过该datapath的数据包。具体的,函数dp_run()
调用dp_ports_run()
来接收到达datapath的数据平面数据包。
(2)接收数据包
函数dp_ports_run()
位于dp_ports模块中,用于从交换机入端口接收数据包并交给pipeline处理,关键代码入下:
1 | static struct ofpbuf *buffer = NULL; |
到此,交换机完成了数据包的接收,数据包已经从网卡设备复制到内存buffer中。然后对接收端口的状态进行检查,如果端口是up
的,该数据包的所有权就交给pipeline函数pipeline_process_packet()
来处理。
(3)pipeline处理数据包
函数pipeline_process_packet()
会取得数据包的所有权,并且负责在处理完数据包之后释放内存资源。piprline模块的的实现见udatapath/pipeline.c
文件。1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44void
pipeline_process_packet(struct pipeline *pl, struct packet *pkt) {
struct flow_table *table, *next_table;
...
next_table = pl->tables[0];
while (next_table != NULL) {
struct flow_entry *entry;
VLOG_DBG_RL(LOG_MODULE, &rl, "trying table %u.", next_table->stats->table_id);
pkt->table_id = next_table->stats->table_id;
table = next_table;
next_table = NULL;
entry = flow_table_lookup(table, pkt);
if (entry != NULL) {
if (VLOG_IS_DBG_ENABLED(LOG_MODULE)) {
char *m = ofl_structs_flow_stats_to_string(entry->stats, pkt->dp->exp);
VLOG_DBG_RL(LOG_MODULE, &rl, "found matching entry: %s.", m);
free(m);
}
pkt->handle_std->table_miss = is_table_miss(entry);
execute_entry(pl, entry, &next_table, &pkt);
/* Packet could be destroyed by a meter instruction */
if (!pkt)
return;
if (next_table == NULL) {
/* Cookie field is set 0xffffffffffffffff
because we cannot associate it to any
particular flow */
action_set_execute(pkt->action_set, pkt, 0xffffffffffffffff);
return;
}
} else {
/* OpenFlow 1.3 default behavior on a table miss */
VLOG_DBG_RL(LOG_MODULE, &rl, "No matching entry found. Dropping packet.");
packet_destroy(pkt);
return;
}
}
VLOG_WARN_RL(LOG_MODULE, &rl, "Reached outside of pipeline processing cycle.");
}
如果交换机中有多级流表,数据包会依次匹配。若匹配成功,则执行流表项的指令action_set_execute(pkt->action_set, pkt..)
。