forked from nmathewson/mixminion
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathHACKING
232 lines (195 loc) · 11.3 KB
/
HACKING
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
Hacking Mixminion
HOW TO CONTRIBUTE PATCHES:
- Send them to the list mixminion-dev@freehaven.net, or to me
(nickm@freehaven.net).
- Include a statement saying that you give permission for me to
redistribute your work under any FSF-approved license Mixminion
may use in the future, including (but not limited to) LGPL, GPL,
modified (3-clause) BSD, X11, or MIT.
System requirements:
Python >= 2.0 (see PORTING NOTES below)
OpenSSL 0.9.7 (If you don't have it, the Makefile knows how to
download it for you.)
NON-HACKING PROJECTS:
- We need documentation.
- See the spec for open issues.
- We need a logo. (An alien with an eggbeater or an alien at a DJ's mixer
are two ideas.)
THINGS TO HACK: (Good introductory projects that I won't get to myself for
at least a version or two.)
- We need some distributed stress-test code. Ideally, it should use SSH to
build Mixminion on a number of machines and start servers on those
machines, then run a bunch of clients to send messages through the system.
It should be possible to run the code with different network and server
configurations. (Difficulty: moderate. Invasiveness: none.)
- The current implementations for the MBOX and SMTP modules open a new
connection to the local MTA for each outgoing message. This is
inefficient; they should batch as much as possible. (Difficulty: easy.
Invasiveness: slight.)
- The current implementation for the MBOX and SMTP modules do not support
ESMTP (over TLS). They should. (Difficulty: easy. Invasiveness: slight.)
- In addition to the stress-test code above, we should have some automated
integration test code to start many servers with different configurations
on the same server, and test them with calls to the client code. This can
probably share a good deal of logic with the stress-test code. A nice
extra would be to allow testing multiple versions of Mixminion with
multiple versions of the client, and have multihost support. (Difficulty:
moderate. Invasiveness: none.)
- It would be neat if all the boilerplate that servers spit out were
configurable via some kind of generic boilerplate mechanism, and stored in
separate files. (Difficulty: easy. Invasiveness: slight.)
- We could use some way to cap total disk usage, and handle full disks sanely.
(Difficulty: moderate. Invasiveness: moderate.)
- If you have access to a multiprocessor machine, it would be nice to make
good use of more than one CPU. Right now, the network code runs in
parallel with the processing code, but the processing thread accounts (I
think) for most of the CPU use. It would be nice to support multiple
processing threads and multiple network threads (round-robin, not
one-per-connection). (Difficulty: easy/moderate, depending on your
knowledge of writing multithreaded code. Invasiveness: slight.)
- We have a specification for multiple exit addresses, but it needs to be
implemented... (Difficulty: easy. Invasiveness: slight.)
THINGS TO THINK ABOUT AND HACK: (Introductory projects that will take some
specification work. Please, get your spec discussed on mixminion-dev and
checked into CVS *before* you submit an implementation for any of these!
[See spec-issues.txt for more open issues].)
- We should have an incoming email gateway for users to use reply blocks to
send messages anonymously without using Mixminion software. (Spec
difficulty: easy. Implementation difficulty: moderate. Invasiveness:
slight.)
- Want a real challenge? We have an allusive description for how to do
K-of-N fragmentation in our E2E-spec documentation. Go flesh out the
description. (Spec difficulty: moderate. Implementation difficulty:
already implemented.)
- Right now, we never generate link padding or dummy messages. The code is
there, but it never gets triggered. Specify when it gets triggered (and
justify why this improves anonymity). (Spec difficulty: ????.
Implementation difficulty: easy once you know how. Invasiveness: some.)
- We could use IPv6 support. The big specification problem here is routing:
an IPv4-only server simply cannot deliver to a server without an IPv4
address. Any path-generation algorithm I can come up with has troublesome
anonymity implications. If you can come up with an algorithm that
doesn't, the code should be pretty easy to do. (Spec difficulty: ????.
Implementation difficulty: easy. Invasiveness: some.)
HARD HACKING:
- Do you want to dive into Python and OpenSSL internals? Right now, we allow
all our data to get swapped out to disk. That's no good! Now, you _could_
install an OS that encrypts your swap, but that's not an option for
everybody. Write code to use memlock to protect sensitive data structures
(keys, packets, etc) as necessary. [If you're feeling unsubtle, use
memlockall to keep *all* data from getting swapped out... but watch out for
ticked-off admins!] (Difficulty: easy/hard depending on whether you take
the easy way out with memlockall, or whether you actually do the right
thing. Invasiveness: all over the place.)
- Want a real challenge? Right now, we store sensitive files right on the
file system, and do our best to overwrite them when they need to be
deleted... but go read the comment in Common.py to see why this doesn't
really work. We *could* have people use encrypted filesystems.... but
that's not an option for everyone. Here's what you do: Write code for a
generic encrypted filestore, and have Mixminion use that as appropriate
instead of the filesystem directly. This will be easier than writing a
real encrypted filesystem, since:
- All of the files are the same size (within a few K).
- There are no hard or soft links
- There are no attributes: no ownership, no modes, no special files, no
atime/mtime/ctime... just name and size.
- There is no arbitrary nesting of directories; the set of directories
is very small.
- No more than one process needs to be able to access the filestore at
a time. (Though multiple threads might need to.)
Your code should probably be generic enough for other Python projects to
use. :) (Difficulty: hard. Invasiveness: moderate.)
- Port the code to use NSS or libgcrypt/GNUTLS instead of/in addition to
OpenSSL. (The OpenSSL license conflicts with the GPL, and makes us
unlinkable with GPL'd code under some circumstances.) Note that NSS does
not, today, have any support for server-side DHE; if you're going to go
down the NSS route, you should contribute an implementation for server-side
DHE. (Difficulty: hard. Invasiveness: moderate.)
- We should eventually port to Twisted, which looks to be an incredibly cool
and featureful networking platform written in Python. Issues:
- I don't want to switch Twisted's favored crypto library: PyOpenSSL.
This library, however cool, lacks a lot of symmetric cipher
functionality I need, and I really don't want to carry _two_ crypto
library dependencies. This means that we need to either bring
PyOpenSSL up to date, or change Twisted's SSL wrapper to speak
_minionlib's admittedly limited OpenSSL dialect.
- Twisted has functionality for a few things that Mixminion's libraries
currently duplicate, such as logging. We'll need to integrate these.
- I'd rather keep the changes one-by-one, and not have to port the
entire codebase in one step. First, we'd change the async loop, then
change more and more of the other IO-intensive stuff to use it.
- The install process MUST NOT get hard! Requiring users to install
Twisted beforehand could get tricky. We'll need to either make the
Twisted install project learn all of our Python-Python-Who's-Got-The-
Python trickery, or add a 'download and install twisted' target.
Benefits:
- Move server HTTP activity into async loop.
- Freebies like windows portability, bandwidth throttling, etc.
(Investigate more)
(Difficulty: moderate. Invasiveness: moderate.)
- After porting to GnuTLS or NSS, separate out all of the C code to its own
process! Feel free to refactor the code while you're at it.
- We could really use a nymserver.
DESIGN PRINCIPLES:
- It's not done till it's documented.
- It's not done till it's tested.
- Don't build general-purpose functionality. Only build the
operations you need.
- "Premature optimization is the root of all evil." -Knuth
Resist the temptation to optimize until it becomes a necessity.
CODING STYLE:
- See PEP-0008: "Style Guide For Python Code". I believe in most of it.
(http://www.python.org/peps/pep-0008.html)
- Also see PEP-0257 for documentation; we're not there yet, but it's
happening. (http://www.python.org/peps/pep-0257.html)
- Magic strings:
"XXXX" indicates a defect in the code.
"FFFF" indicates a missing feature.
"????" indicates an untested or iffy block.
"DOCDOC" indicates missing documentation.
PORTABILITY NOTES:
- I've already backported to Python 2.0. (I refuse to backport to 1.5 or
1.6.)
- Right now, we're dependent on OpenSSL. OpenSSL's license has an
old-style BSD license that isn't compatible with the GPL. We
have two other options, it seems:
- libnss: this is a dual-license GPL/MPL library from
Mozilla. Sadly, we can't use it now, because it doesn't
yet support server-side DHE. Bugzilla says that
server-side DHE is targeted for 3.5. Perhaps then we can
port, but I wouldn't hold my breath.
- gnutls/libgcrypt: These are the GNU offerings; the relevant
portions of each are licensed under the LGPL. They don't
support OAEP, but we've already got an implementation of that
in Python.
So for now, it's OpenSSL. I'll accept any patches that make us
run under gnutls/libgcrypt as well, but I think in the long term
we should migrate to libnss entirely.
CAVEATS:
- If I haven't got a test for it in tests.py, assume it doesn't work.
- The code isn't threadsafe. It will become so only if it must.
FINDING YOUR WAY AROUND THE CODE.
- All the C code is in src/. Right now, this is just a set of thin
Python wrappers for the OpenSSL functionality we use.
- The Python code lives in lib/mixminion (for client and shared code)
and lib/mixminion/server (for server code only). The main loop is
in lib/mixminion/server/ServerMain; the asynchronous server code is
in lib/mixminion/server/MMTPServer.
- A message coming in to a Mixminion server takes the following path:
Received in MMTPServer;
V
Stored in an 'Incoming' queue, implemented in ServerMain and Queue.
V
Validated, decrypted, padded by code in PacketHandler.
V
Stored in a 'Mix' pool, implemented in ServerMain and Queue.
V
A batch of messages is selected by the 'Mix pool' for delivery.
V
---Some are placed in an 'Outgoing' queue...
I V
I ... and delivered to other Mixminion servers via MMTPServer
V
---Others are queued and delivered by other exit methods,
implemented and selected by ModuleManager.
--Nick