Rodi Introduction

Contact author of the logo yulenka_re@yahoo.com Rodi or Rodia (Ρόδι or Ροδιά) means pomegranate in Greek. Tiny (under 200K of binary code) P2P client/host implemented in pure Java. Network use similar with bitTorrent concept. The program will serve filesharing community as well as Open Source community facilitating fast data/software deployment.

Some problems were discovered while evaluating BitTorrent. HTTP based message exchange between tracker and clients is too expensive both in terms of the server CPU consumption and upstream bandwidth. Compatibility issue between tracker and clients can be solved in the approach using compact binary packets for control stream.
There is no any specific reason to use ASCII packets. For example, version field can be MUST in the packet and placed at well known location/offset in the packet.
If tracker fails there is no way to move torrent file to another tracker and inform all seeds/clients that tracker is moved. New torrent file should be created and data distribution process should be started from the very beginning.
If after restart client finds that tracker is down there is no way to continue download
Nodes send "what i have" messages which consumes some upstream and part of downstream of the destination peer. There is no reason to send anything to the peer without direct request from the peer and even in this case request can be ignored to save the upstream.

In the typical application downstream of the client is significantly wider than upstream, like 4:1 or even more. The case is especially noticeable for ADSL lines which serve about 100 millions subscribers (number is valid for the end of 2003, source http://www.lightreading.com/). I can make case that modern networks are reliable and require only minimal set of flow control and retransmission features above the data link (UDP). There is no reason to send data over TCP if client is not going to use flow control of TCP. In the simplest scenario the host can use best effort scheme when sending packets to the client with no timeout for acknowledge. If client fails to receive a packet new request can be issued at any time, assuming that packet can be found on the host. Client can optionally specify in it's request to the host optimal burst size (window size), packet size, inter burst and inter packet delay. In the best scenario client will not issue any requests to the host beside initial request. Because UDP connection is stateless no time will be spent for establishing peer to peer connection.

Security is a huge problem for the existing bittorrent network. In most cases bit torrent tracker accept any client, in some cases client should go through registration procedure running by regular WEB server before the client gains access to the tracker. Part of the registration procedure is saving client IP address which is assumed to be unique. Many question immediately arise. Not clear how the system can work if client is protected by PROXY and real IP address is invisible for any 3rd party. What happens with dynamic IP addresses. How tracker can assure that current request is arrived from the client registered on the server and not from one with the same (faked?) IP address. How host can make sure that request arrived from the authorized client ? How client can make sure that the host storing the data is authorized.

Security Upstream. One possible solution that secured registration procedure includes sending to the client via protected channel 2 keys - client ID and tracker ID. Besides two keys supplied by the server, there is a nickname chosen by client. When client initiate transaction it receives from the tracker session ID - this is a third key used by the client for the client - host communication. Every request from the client to the tracker contains client nickname and protected by MD5 for the whole data including the client ID, but client ID itself is not sent to the line. Similarly every request from the client to the host is protected by MD5 for the whole data including tracker ID and session ID, but tracker ID is not sent to the line. I assume here that all hosts already received from the server tracker ID and can calculate and check MD5 of the incoming packet. Correct tracker ID means that the client belongs to the network. Combination of correct client ID and client name means that the client specified correct user name and have initial access to the tracker.
Here how different packets look like for client to tracker and client to host communication
Client to tracker (for MD5 calculation) Nickname Client ID Payload N/U
Client to tracker (for sending) Nickname N/U Payload N/U
Client to host (for MD5 calculation) Tracker ID Session ID Nickname Payload
Client to host (for sending) N/U Session ID Nickname Payload

Security Downstream
.Similar to the client tracker signs every packet with MD5 calculated for the whole data including tracker ID and session ID, but send the packet which contains only session ID and payload. Upon receiving the packet client is expected to recover the initial packet using stored tracker ID, calculate MD5 sum and compare with received summa.When communicating with host client expects that all packets arriving from the host are signed with MD5. Host calculate MD5 using stored tracker ID and session ID, but sends only payload and MD5 summa. Here how different packets look like for tracker to client and host to client communication
Tracker to client (for MD5 calculation) Nickname Client ID Payload N/U
Tracker to client(for sending) Nickname N/U Payload N/U
Host to client(for MD5 calculation) Tracker ID Session ID Nickname Payload
Host to client(for sending) N/U Session ID Nickname Payload

Data encryption and how to fight P-Cube, Allot Communication, Expand Networks, Lancope Inc., Ellacoya Networks, Packeteer and similar solutions. One of the possible scenarios that the payload can be encrypted by the host before sending to the line with some private key generated by the tracker. Client receives public key generated by the tracker as part of initial handshake. In the different approach the whole data packet can be encrypted in the data exchange between host and client and between client and tracker. Let's talk more about this. There are three separate problems the system can try to answer.

  • Fighting traffic analyzers
  • Protocol details/message exchange encryption
  • Data protection
Apparently analyzer uses some simple rule based on IP address and port number to collect the statistics or even drop the packets if ISP decides that traffic is illegal or parasitic. In the more advanced analyzers "deep inspection of packets, including the identification of layer-7 patterns and sequences" is supported. P2P network can use some simple encoding algorithm, for example, XOR with long key. The strength of the scheme is regulated by the length of the key, frequent renewing and total number of keys. Let's assume that length of the key is 1M characters, there are 1M different keys - hosts generate different keys for the published files. At this point reliable analyzer is expected to store and actively use about 1T characters of keys. Let's also suggest that keys are made accessible for registered clients using different protocols, like e-mail, FTP, HTTP, etc. Because normally high speed analyzer is a real-time embedded device it can't reach such goal as collecting 1Tbytes of keys. See also Firewalls below.
Can be argued also that when ISP sells contract including 24/7 unlimited access at specified bandwidth customer expectations are that network performance is application independent, otherwise ISP should issue clear statement as part of the contract that, for example, P2P or proprietary VoIP application has low priority and services provided by ISP's mail server has higher priority. Let's say that IP address and port number can not be recovered by intercepting system, for example, all nodes are behind PROXY servers supporting packet encryption in both directions. This approach bring additional costs related to the maintenance of PROXY servers. P2P protocol can be defined in a way that helps ISP to recognize replicas and cache the data locally. For example, GetData request is a unique URL string and GetData response is a binary packet. Such ISP friendly approach could decrease network loading significantly. There is a copyright issue arise, because cache is effectively part (or even full) of probably copyright material and ISP cache participates in illegal sharing of copyright material. The issue is far from clear, because ISP does not make any usage of the information in cache besides facilitating the data transfer, these are the hosts who store and publish the material (see http://www.joltid.com/ and http://zdnet.com.com/2100-1104_2-1027508.html for more on caching). Typical internet browser stores information in the temporary files cache and it is not considered infringing on copyright laws.
Data protection is one most trickiest. Network consists of trusted and untrusted nodes. Let's say that tracker is a trusted party by both host and the client entity. Let's say that all nodes protected by PROXY and real IP addresses can not be recovered from the packet. We need some procedure of exchanging the keys between host and client. Host encrypts the data and publish the key and data title on the tracker. As part of the publishing procedure host specify what clients have an access to the data and encryption key. Host can generate separate keys for every trusted client using client's nickname. This way the data can not be accessed without one of the limited number of keys. When host sends data to the trusted client it uses the correspondent key to encrypt the data, client must use provided by the host key and only this key to access the data. To avoid on the fly encryption host can use single public key for encryption. This way host can cache already encrypted data for future use. Pay attention that thanks to MD5 signature tracker and host are sure that client is authorized to access the data. Client upon receive of the data use the personal and unique key to open the packet and check MD5. Let's say that tracker is not trusted node. The only solution I see at this moment is using separate media (like e-mail) for key exchange or some trusted server belong to 3rd party.
Anonymity. Two different problems here
  • Real 100% Anonymity
  • Fight intercepting equipment
Let's say that powerfull 3rd party - let's call them adversary, installs a client and start to log IP addresses belong to the network. The adversary assumes that there is no bouncers used by the network otherwise the data collected this way is not reliable. In other words, if the adversary intercepts a packet containing information it is looking for it does not necessary mean that source IP of the packet belongs to the node hosting the data. Pay attention, that the burden of proof is put on the adversary.
Bouncer does not provide 100% anonymity, because the adversary still can attack the system using statistical analysis of response time, intercepting ALL incoming and outgoing requests from specific node, etc. Still the concept looks good enough for many applications. Rodi uses UDP packets for both data and control messages (like search). It can be argued that source IP of any packet can be faked and log of the traffic can not be regarded as a proof that specific host sent the packet. Let's say that adversary sends a data request to specific IP address and receives reply - packet containing some other IP source and the data. Publisher of the data can argue that the data request was handled by some other node. In the real network we can bounce data requests but data transfers can still be P2P. Important! We use conectionless protocol like IP. In case of IP it's enough to specify correct destination IP for data delivery. All retransmission requests are routed through the network.
Rodi provides limited routing capabilities. Optionally publisher can specify to use it's randomly generated ID (see MUTE filesharing network for details) instead of IP address. Client looking for the data receive publisher ID and not publisher IP address. Every client stores a routing table to support anonymous operations and allocate part of the upstream. The rule here if you don't route you don't get routing services from the neighbors also called Autonomous Reputation Scheme or autonomous tit-for-tat mechanism.
Node can stamp the outgoing packets with arbitrary source IP (usually it's not possible if you behind DSL/cable modems, NAT). Rodi can ignore source IP of the packet and fetch the contact information from the payload.
Another approach to the problem of anonymous internet access is WiFi. ISP can sell service not on monthly subscription base but pay-per-day/pay-per-hour. For example, imagine ISP providing services in the appartment building or in train. Customer simply buys a card with a key/coupon and pay in cash (coupon can be printed on the train ticket). The ISP does not have to keep billing information. MAC address and DHCP server log is the only way to find out what desktop was using any specific IP. Even if ISP keeps DHCP logs client can always alrgue that anybody could fake the MAC because MAC address is configurable on most Ethernet cards.
Bouncers. Let's call publisher server P, downloader D and some other peer B (bouncer). Let's assume also that the protocol is IP based. P never accepts data/look requests directly from D. D sends packet to B with it's (D) source IP in the IP header and in the "get data" request. B forwards the packet to P with new (B) IP address in the IP header. P receives the packet and checks that IP source in the header is (not) the same as in the request and/or that source IP (IP of B) belongs to friendly host (group security server, for example). P sends data directly to D.
Essentially the idea here is to bounce control messages but send directly the data itself. Because the protocol is UDP based it can not be reliably proved that P published the data, because P can always argue that the IP address was spoofed by B. B has no idea about actual data transfer because it sees only the "get data" request and not the data itself. D can argue that it sent request to one peer but received (or did not receive) data from the other and the data was not what it (D) was looking for.
While adversary can easily find all participants - P, D and B, it is much harder to prove that the data transfer ideed took place.
Let's say that B is adversary - it sees only data reqest.
Let's say that adversary is D - there is no prove that incoming packets with P IP address are indeed generated by P.
Let's say that adversary is P. D can argue that it did not receive a single packet and even more, that D never asked for the data in first place and it is B which did the work.
Let's say that adversary runs both B and D. P can choose bouncer(s) randomly among existing peers or from list of well known IP addresses. This way adversary can never run B.
The only way adversary can attack the network is to log traffic from all three participants - P, D and B. One solution mighty adversary can use is to sign agreement with all service providers (ISPs) where ISPs are oblidged to log the traffic from any peer upon request and request can come in real-time. let's say that all ISPs open access to their routers for the adversary. While it's possible it is not easy for large scale network.
Another aplication for bouncres is centralized security server holding encryption keys for control part of the protocol. Distinct pairs of keys can be generated for all participants and security server is the only place where all keys (public and/or private) are kept.
IP network topology awareness, internal rounting protocol, looking for shortest pathTBD
Firewalls. Using any and only one IP port (for example, HTTP - 80) for the communication can help to fight some of the firewall configurations. Still in some cases stronger scheme should be implemented. The system can fake look&feel of real RTP or NFS packet - IP port number, Protocol type, etc. and destination node can still process the packet as regular. The system does not make use of TCP flow control features and essentially implement it's own packet delivery protocol (it can be, for example, flavor of Frame Relay protocol). In the first phase i am going to fake RTP packets. It makes sense, because
  • RTP runs over UDP, so it's natural for the UDP sockets created by the application and does not require native (non-Java) methods for implementation.
  • RTP creates symmetrical traffic (upload is roughly equal to download) and adversary can't use statistical analysis to find nodes with abnormal behaviour
  • It's easy for the receiver to work with both faked - containing RTP header, and regular packets. Receiver can make an attempt to read the packet as non-RTP and if fails (legal message event not found, etc) can try to start from 12 bytes offset (RTP header size)
  • Payload is expected to contain binary data and spying equipment will not make an attempt to parse the data
Among other convenient UDP based protocols application can use DNS (port 53).
IP Multicast. Next step from UDP seems obvious - using multicast for serving multiple clients. TBD

I will appreciate any comments of the project, especially from IT professionals, WEB managers, ISPs.



See also:
BitTorrent Specification
Filesharing applications http://dmoz.org/Computers/Software/Internet/Clients/File_Sharing/
MUTE http://mute-net.sourceforge.net/
GNUnet http://www.gnu.org/software/gnunet/
RTP header http://www.networksorcery.com/enp/protocol/rtp.htm
UDP data transfer http://dsd.lbl.gov/DIDC/PFLDnet2004/talks/Grossman-slides.pdf
Multicast http://www.nwfusion.com/details/502.html
Multicast security http://www.live.com/mcastfw.html
HP tech report "Peer-to-Peer Computing Dejan S. Milojicic, Vana Kalogeraki, Rajan Lukose, ..." http://www.hpl.hp.com/techreports/2002/HPL-2002-57.pdf
P2P and related SW http://www.slyck.com/programs.php
Another P2P news web site http://p2pnet.net/
P2P statistics (Sep, 2004) http://www.itic.ca/DIC/News/2004/09/02/P2P_Statistics_August_2004.en.html
P2P and SIP http://www.techweb.com/wire/networking/47900119
Report on worldwide digital piracy http://www.itic.ca/DIC/News/index.html
P2P Gets Serious http://www.lightreading.com/document.asp?doc_id=56091
Why DRM is bad http://www.craphound.com/msftdrm.txt
Publications on Coral project web site http://www.scs.cs.nyu.edu/coral/overview/
More on IRIS here http://www.newscientist.com/news/news.jsp?id=ns99992861 (sponsored by National Science Foundation)
More research publications from Chord http://pdos.lcs.mit.edu/chord/
Self-certifying File System (SFS) http://www.fs.net/sfswww/
Berkeley DB http://www.sleepycat.com/
Wavelet Transform Tutorial http://users.rowan.edu/~polikar/WAVELETS/WTtutorial.html
High resolution wallpapers from Dali, Picasso http://www.rasiel.com/
Version Control System Comparison http://better-scm.berlios.de/comparison/comparison.html
Keyword Matching http://www.limewire.org/techdocs/KeywordMatching.htm
Java code examples http://www.limewire.org/techdocs.shtml
Open Source JVM http://sablevm.org/docs.html
Java to ELF compiler http://jcvm.sourceforge.net/
Another Java compiler http://gcc.gnu.org/java/
Subversion Version Control http://svnbook.red-bean.com/
Performance Comparison of Java/.NET Runtimes http://www.shudo.net/jit/perf/index.html
Project WASTE documentation http://waste.sourceforge.net/docs/docs.html
Project MNET documentation http://mnetproject.org/repos/mnet/doc/
More details on traffic shapers http://www.broadband-pbimedia.com/ct/archives/0703/0703_pondering.html
and more comments from lightreading http://www.lightreading.com/boards/message.asp?msg_id=93718
Another way to secure access to the server "Port Knocking" http://www.portknocking.org/
"The longest and most comprehensive measurement study" of Bittorrent http://pds.twi.tudelft.nl/~pawel/pub/bittorrent.pdf
Streaming P2P and Bittorrent http://www.ifi.uio.no/dmms/papers/129.pdf

Rodi GUI Spec

Let's keep it as simple as possible. Rodi core provides access to the internal data bases and management functions. Core opens management socket and waits for requests from the management. Management can be just a simple CLI or graphic and color rich GUI application and can be written using any language and can run on any system. This project develops two interfaces - CLI and light GUI. CLI provides menus to print current statistics, configure system parameters, start/stop upload/download, publish data, etc.
GUI resembles CLI in the look&feel. GUI is table based, JDK 1.2 compliant (no Swing elements), light weight, distributed as JAR file and source (click example). Binary code limitations - GUI is not expected to be larger than the core. Because management interface is provided through socket browser based interfaces can be easily developed. User fill address line of the browser with line like http:\\localhost:6969. Browser downloads the GUI code from the engine and starts it. Any language can be used for GUI - HTML, JavaScript, Java, Flash.
Screen shots

This is another logo for the project you can vote for the best logo using email larytet@yahoo.com
Contact author of the logo yulenka_re@yahoo.com
Contact author of the logo yulenka_re@yahoo.com



Home