top of page
Writer's picturecrowusbureno

's Largest BitTorrent System: The Benefits and Risks of Using Torrents in 2023



To send or receive files, users use a BitTorrent client on their Internet-connected computer. A BitTorrent client is a computer program that implements the BitTorrent protocol. BitTorrent clients are available for a variety of computing platforms and operating systems, including an official client released by Rainberry, Inc. Popular clients include μTorrent, Xunlei Thunder,[2][3] Transmission, qBittorrent, Vuze, Deluge, BitComet and Tixati. BitTorrent trackers provide a list of files available for transfer and allow the client to find peer users, known as "seeds", who may transfer the files.


The distributed nature of BitTorrent can lead to a flood-like spreading of a file throughout many peer computer nodes. As more peers join the swarm, the likelihood of a successful download by any particular node increases. Relative to traditional Internet distribution schemes, this permits a significant reduction in the original distributor's hardware and bandwidth resource costs. Distributed downloading protocols in general provide redundancy against system problems, reduce dependence on the original distributor,[14] and provide sources for the file which are generally transient and therefore there is no single point of failure as in one way server-client transfers.




's Largest BitTorrent System




In May 2007, researchers at Cornell University published a paper proposing a new approach to searching a peer-to-peer network for inexact strings,[20] which could replace the functionality of a central indexing site. A year later, the same team implemented the system as a plugin for Vuze called Cubit[21] and published a follow-up paper reporting its success.[22]


In the early days, torrent files were typically published to torrent index websites, and registered with at least one tracker. The tracker maintained lists of the clients currently connected to the swarm.[1] Alternatively, in a trackerless system (decentralized tracking) every peer acts as a tracker. Azureus was the first[30] BitTorrent client to implement such a system through the distributed hash table (DHT) method. An alternative and incompatible DHT system, known as Mainline DHT, was released in the Mainline BitTorrent client three weeks later (though it had been in development since 2002)[30] and subsequently adopted by the μTorrent, Transmission, rTorrent, KTorrent, BitComet, and Deluge clients.


BitTorrent does not, on its own, offer its users anonymity. One can usually see the IP addresses of all peers in a swarm in one's own client or firewall program. This may expose users with insecure systems to attacks.[24] In some countries, copyright organizations scrape lists of peers, and send takedown notices to the internet service provider of users participating in the swarms of files that are under copyright. In some jurisdictions, copyright holders may launch lawsuits against uploaders or downloaders for infringement, and police may arrest suspects in such cases.


i2p provides a similar anonymity layer although in that case, one can only download torrents that have been uploaded to the i2p network.[34] The bittorrent client Vuze allows users who are not concerned about anonymity to take clearnet torrents, and make them available on the i2p network.[35]


On 2 May 2005, Azureus 2.3.0.0 (now known as Vuze) was released,[40] introducing support for "trackerless" torrents through a system called the "distributed database." This system is a Distributed hash table implementation which allows the client to use torrents that do not have a working BitTorrent tracker. Instead just bootstrapping server is used (router.bittorrent.com, dht.transmissionbt.com or router.utorrent.com[41][42]). The following month, BitTorrent, Inc. released version 4.2.0 of the Mainline BitTorrent client, which supported an alternative DHT implementation (popularly known as "Mainline DHT", outlined in a draft on their website) that is incompatible with that of Azureus. In 2014, measurement showed concurrent users of Mainline DHT to be from 10 million to 25 million, with a daily churn of at least 10 million.[43]


A technique called broadcatching combines RSS feeds with the BitTorrent protocol to create a content delivery system, further simplifying and automating content distribution. Steve Gillmor explained the concept in a column for Ziff-Davis in December 2003.[52] The discussion spread quickly among bloggers (Ernest Miller,[53] Chris Pirillo, etc.). In an article entitled Broadcatching with BitTorrent, Scott Raymond explained:


In August 2007, Comcast was preventing BitTorrent seeding by monitoring and interfering with the communication between peers. Protection against these efforts is provided by proxying the client-tracker traffic via an encrypted tunnel to a point outside of the Comcast network.[58] In 2008, Comcast called a "truce" with BitTorrent, Inc. with the intention of shaping traffic in a protocol-agnostic manner.[59] Questions about the ethics and legality of Comcast's behavior have led to renewed debate about net neutrality in the United States.[60] In general, although encryption can make it difficult to determine what is being shared, BitTorrent is vulnerable to traffic analysis. Thus, even with MSE/PE, it may be possible for an ISP to recognize BitTorrent and also to determine that a system is no longer downloading but only uploading data, and terminate its connection by injecting TCP RST (reset flag) packets.


The BitTorrent specification is free to use and many clients are open source, so BitTorrent clients have been created for all common operating systems using a variety of programming languages. The official BitTorrent client, μTorrent, qBittorrent, Transmission, Vuze, and BitComet are some of the most popular clients.[64][65][66][67]


Abstract:BitTorrent is a very scalable file sharing protocolthat utilizes the upload bandwidth of peers to offloadthe original content source. With BitTorrent, each file issplit into many small pieces, each of which may be downloadedfrom different peers. While BitTorrent allows peers to effectivelyshare pieces in systems with sufficient participatingpeers, the performance can degrade if participationdecreases. Using measurements of over trackers,which collectively maintain state information of a combinedtotal of 2.8 million unique torrents, we identify manytorrents for which the system performance can besignificantly improved by re-allocating peers among the trackers.We propose a light-weight distributed swarm management algorithmthat manages the peer torrents while ensuring load fairnessamong the trackers. The algorithm achieves much of its performanceimprovements by identifying and merging small swarms, for whichthe performance is more sensitive to fluctuations in the peer participation,and allows load sharing for large torrents.1 IntroductionBitTorrent is a popular peer-to-peer file-sharing protocol that has been shown to scale well to very large peer populations [9].With BitTorrent, content (e.g., a set of files) is split into many small pieces, each of which may be downloaded from different peers. The content and the set of peers distributing itis usually called a torrent.A peer that only uploads content is called a seed, while a peer that uploads and downloads at the same time is called a leecher.The connected set of peers participating in the piece exchanges of a torrent is referred to as a swarm.A client that wants to download a file can learn about other peers that share the same content by contacting a tracker at its announce URL. Each tracker maintains a list with state informationof known peers that currently are downloading and/or uploading pieces of the file, and provides a peer with a subset of the known peers upon request. Upon requesting the list of peers, the peer has to provide the tracker with information about its download progress. Additionally, the peers must inform the tracker about changes in their status (i.e., when they join, leave, or finish downloading the file). To avoid overloading trackers, the BitTorrent protocol only allows a peer to associate with one tracker per file that it is downloading (unless the tracker is no longer available and a new tracker must be contacted). Naturally, torrents may therefore have multiple parallel swarms. While BitTorrent allows peers to effectively share pieces of popular torrentswith many peers in a swarm, the performance of smalltorrents and swarms is sensitive to fluctuations in peer participation.Measurement data suggests that peers in small swarms achieve lower throughputon average (e.g., Figure 9 in [9]). Most swarms are unfortunately small; several sources confirm that the popularity distribution of p2p content follows a power-law form, with a ``long tail'' of moderately popular files (see Figure 1(a) and, e.g., [1,3,4,5]).At the same time, the measurement data we present in this paper shows that many torrents consist of several swarms (see Figure 1(b)).Potentially, if one could dynamically re-allocate peers amongthe trackers such that multiple small swarms of a torrent are merged into a single swarm,then one could improve the file sharing performance of the peers belonging to these torrents.Motivated by these observations, the goal of our work is to evaluate the feasibility and the potential gains of dynamic swarm management for BitTorrent trackers. To support our evaluation, we performed measurements of trackers, which collectively maintain state information of a combined total of million unique torrents.We propose a light-weight distributed swarm management algorithm, called DISM,that also ensures load fairness among the trackers. Based on our measurement data and using DISM weargue that dynamic swarm balancing could lead to a significant performance improvementin terms of peer throughput. We also briefly discuss alternatives for swarm management that could lead to similar performance improvements. The remainder of the paper is organized as follows. Section 2 describes our design objectivesand a light-weight distributed swarm management algorithm. Section 3 presents validation and performance results. Related work is discussed in Section 4. Finally, Section 5 concludes the paper.2 Distributed Swarm ManagementIgnoring the seed-to-leecher ratio, small swarms typicallyperform worse than larger swarms. However, for load sharingand reliability purposes it may be advantageous to split theresponsibility of maintaining per-peer state informationacross multiple trackers, i.e., to allow several swarms to coexist. Consequently, dynamic swarm management should make it possible to (i) merge swarmsbelonging to a torrent if they become too ``small'', and to (ii) split a swarm or to re-balance swarms if the swarms become sufficiently ``large''.In general, it is hard to define when a swarm should be considered ``small'' or ``large'',but for simplicity, we assume that a swarm can beconsidered ``large'' if it has at least participating peers, for some threshold .``Large'' swarms are likely to have high piece diversity and aremore resilient to fluctuations in peer participation.Apart from these two properties, one would like to minimize the effect of swarm balancing on the traffic load of the trackers by (iii) avoiding a significant increase in the number of peers for any individual tracker (load conservation), and by (iv) minimizing the number of peers that are shifted from one tracker to another (minimum overhead, especially shifts associated with torrents being re-balanced).The distributed swarm management (DISM) algorithm we describe in thefollowing was designed with these four properties in mind.Our algorithm allows swarms to be merged and to be split: swarms are merged whenever a swarm has less than participating peers, and a swarm is split (or re-balanced) over multiple trackers only if splitting the peers does not cause any swarm to drop below peers. The algorithm ensures that no tracker will see an increase of morethan peers in total (across all torrents) and typically much less,and it does not balance swarms that have at least peers each. The DISM algorithm is composed of two algorithms.It relies on a distributed negotiation algorithm to determine the order in which trackers should perform pairwise load balancing. On a pairwise basis, trackers then exchange information about the number of active peers associated with each torrent they have in common (e.g., by performing a scrape), and determine which tracker should be responsible for which peers. 2.1 System ModelTable 1 defines our notation.In the following, we consider a system with a set of trackers , witha combined set of torrents .We will denote by the set oftrackers that track torrent , and by the set of torrents that are tracked by tracker .Every torrent is tracked by at least one tracker(i.e., ), but thesubsets are not necessarily pairwise disjoint.Finally, let us denote the number of peers tracked by tracker for torrent by , the total number of peers tracked by tracker by , and the total number of peers associated with torrent by .Table 1:NotationParameterDefinitionSet of trackersSet of torrentsSet of trackers that track torrent Set of torrents tracked by tracker Number of peers tracked by tracker Number of peers in torrent Number of peers of torrent that are tracked by tracker Threshold parameter2.2 Distributed NegotiationThe distributed negotiation algorithm assumes that each tracker knows the set of torrents that has in common with each other tracker for which the trackers' torrents are not disjoint (i.e., for which).Note that this information should be availablethrough the torrent file, which should have been uploaded to the tracker when the torrent was registered with the tracker.The algorithm works as follows. Tracker invites for pairwise balancing the trackers for which the overlap in tracked torrents, is maximal among thetrackers with which it has not yet performed the pairwise balancing. A tracker accepts the invitation if its overlap with tracker is maximal. Otherwise, tracker asks tracker to wait until their overlap becomes maximal for as well.The distributed algorithm guarantees that all pairs of trackerswith an overlap in torrents will perform a pairwise balancing once and only once during the execution of the algorithm.If tracker accepts the invitation, tracker queries for from , executes the pairwise balancing algorithm described below, and finally tells the results to tracker .2.3 Pairwise Approximation AlgorithmRather than applying optimization techniques, we propose a much simpler three-step greedy-style algorithm. First, peers are tentatively shifted based only on information local to each individual torrent. For all torrents that require merging (i.e., for which ),all peers are tentatively shifted to thetracker that already maintains information about more peers for that torrent.For all torrents that should be re-balanced (i.e., for which and ), the minimum number of peers () needed to ensurethat both trackers have at least peers aretentatively shifted to the tracker with fewer peers.Second, towards achieving load conservation of , the peer responsibility of some torrents may have to be adjusted. Using a greedy algorithm, the tentative allocations are shifted from the tracker that saw an increase in peer responsibility (if any) towards the tracker that saw a decrease in overall peer responsibility.To avoid increasing the number of peers associated with partial shifts, priority is given to shifting responsibilities of the torrents that are being merged. Among these torrents, the algorithm selects the torrent that results in the largest load adjustment and that does not cause the imbalance in overall load to shift to the other tracker. By sorting torrents based on their relative shift, this step can be completed in steps.Finally, if overall load conservation is not yet fully achieved,additional load adjustments can be achieved by flipping the peer responsibilitiesof the pair of torrents that (if flipped) would result in the load split closest toachieving perfect load conservation (across all torrents), with ties broken in favorof choices that minimize the total shift of peers.Of course, considering all possible combinations can scale as. However, by noticing that only the torrents with the smallest shift for each load shiftare candidate solutions, many combinations can be pruned. By sorting the torrents appropriately, our current implementationachieves whenever is finite.We have also considered an alternative version in which priority (in step two)is given to allocations of the torrents that are being re-balanced. For these torrents, we (greedily) balance the load of the torrent with the largest imbalance first, until load conservation has been achieved or all such torrents have reached their balancing point. Results using this algorithm are very similar to those ofour baseline algorithm and are hence omitted.2.4 Implementation and OverheadThe proposed DISM algorithm could be implemented with a minorextension to the BitTorrent protocol. The only new protocol message requiredis a tracker_redirect message that could be used by a tracker to signal to a peer that the peer should contact an alternative trackerfor the torrent. The message would be used by a tracker for a torrent for which decreases due to the execution of DISM.Peers that recieve the message shouldcontact another tracker they know about from the tracker file.The communication overhead of DISM is dominated by the exchangeof torrent popularity information between the trackers and by the redirection messages sent to the peers. Distributed negotiatation involves one tracker scrape before every pairwise balancing, and the correspondingexchange of the results of the balancing. The amount of data exchanged betweentrackers and is hence .The amount of redirection messages is proportional to the number of peers shifted between swarms, and is bounded by .3 Protocol ValidationFigure 2:The average coefficient of variation (CoV) as a function of (a) peers, (b) leechers, and (c) seeds.(a) Torrent/swarm size(b) Number of swarms(c) Normalized CoVFigure 1:Basic properties of the multi-torrent, multi-swarm system.(a) Number of peers(b) Number of leechers(c) Number of seeds3.1 Empirical Data SetWe used two kinds of measurements to obtain our data set.First, we performed a screen-scrape of the torrent search engine www.mininova.org.In addition to claiming to be the largest torrent search engine, mininova was the most popular torrent search engine according to www.alexa.com during our measurement period (Alexa-rank of , August , ). From the screen-scrapeswe obtained the sizes of about files shared using BitTorrent, and the addresses of trackers. Second, we scraped all the trackers for peer and download information of all the torrents they maintain. (Apart from interacting and helping peers, a tracker also answers scrape requests at its scrape URL.) For the tracker-scrapes we developed a Java application that scrapes the scrape URL of each tracker. By not specifying any infohash, the tracker returns the scrape information for all torrents that it tracks. This allowed us to efficiently obtain the number of leechers, seeds, and completed downloads as seen by all trackers that we determined via the screen-scrape of mininova.We performed the tracker-scrapes daily from October , , to October , . All scrapes were performed at pm GMT.We removed redundant tracker information for trackers that share information about the same swarms of peers, and identified independent trackers. Table 2 summarizes the data set obtained on October 10, 2008. 2ff7e9595c


0 views0 comments

Recent Posts

See All

Comments


bottom of page