GPU Selection and Task Allocation

Blendr might use a variety of metrics to select GPUs for tasks, such as performance benchmarks (e.g., OctaneBench Points per Hour) and reputation scores.

DEV refers to any kinds of devices that can implement our DMA engine. that invokes the callback function of the triggered event. In TensorFlow, it takes ∼58.7μs for the callback thread to acquire the mutex lock from when it is released by the polling thread. This delay could be reduced to as low as 5μs if both threads are running on the same CPU core, but co-locating the threads or even merging them into a single one would increase the event polling interval as well as the overall processing time. Lastly, it is inefficient for the callback thread to deliver the computation command to GPU B. Delivering the event signal to GPU B would take only 2∼3μs if implemented efficiently, 2μs but we need to deliver the callback command binary as well. This extra delay could be avoided if we can deliver the GPU command ahead of time and trigger it later on the CPU side, but this is not supported by commodity GPU.

Networked: all nodes are interconnected with other nodes in the P2P system, and the full set of nodes are members of a connected graph. When the graph is no longer connected, the overlay network becomes partitioned.

Decentralization: the behavior of the P2P system is determined by the collective actions of peer nodes, and there is no central control point. Some systems however secure the P2P system using a central login server.

The ability to manage the overlay and monetize its operation may require centralized elements.

Symmetry: nodes assume equal roles in the operation of the P2P system. In many designs this property is relaxed by the use of special peer roles such as super peers or relay peers.

Autonomy: participation of the peer in the P2P system is determined locally, and there is no single administrative context for the P2P system.

Scalable: this is a pre-requisite of operating P2P systems with millions of simultaneous nodes, and means that the resources used at each peer exhibit a growth rate as a function of overlay size that is less than linear.