There was a major skype outage last week and though there is an “official explaination” and other discussions about it floating around, I found this comment from one of the GigaOm readers more interesting to think about. Now this particular description may not accurately describe the problem (which might be speculation as well) but it does describe , in a few words, how skype’s p2p network scales out. You should also take a look at the detailed discussion of the skype protocol here.
Number of Skype Authentication servers:
Count == 50; // Clustered
Number of potential Skype clients:
Count = 220,000,000 // Mostly decentralized
Number of SuperNode clients to maintain network connectivity:
Count = N / 300 at any one time.
Ã¢â‚¬Â¢ If there are 3.0 million users online then the ratio is 3,000,000 / 300 = 10,000 == Supernodes available
Ã¢â‚¬Â¢ Supernodes are bootstraps into the network for normal first run clients (“and handle routing of children calls”).
Ã¢â‚¬Â¢ Supernodes maintain the network overlay via a DHT(“Distributed Has Table”) “type” method. // This is normally very slow and done over UDP
Ã¢â‚¬Â¢ If a client cannot find a Supernode, regardless of authentication via central server then is NOT allowed on the Skype network.
Lack of Supernodes mean lack of network connectivity regardless of successful login via Ã¢â‚¬Å“central serverÃ¢â‚¬Â.
You CAN be a Supernode but not have full network connectivity because you have only a portion of the Ã¢â‚¬Å“Distributed Index Data aka DHTÃ¢â‚¬Â.
MOST people that become Supernodes will bail out if they cannot keep a clear route (Ã¢â‚¬Âaka calls bail out, client restarts and aborts Supernode status, thus booting itÃ¢â‚¬â„¢s 300 – 500 Children and putting them into a Ã¢â‚¬Å“Connecting modeÃ¢â‚¬Â.
Children that are trying to Ã¢â‚¬Å“ConnectÃ¢â‚¬Â are unable to do anything unless they have a Ã¢â‚¬Å“SupernodeÃ¢â‚¬Â as a parent. // No calls, No IMÃ¢â‚¬Â¦.
The overview of this is as follows:
Skype introduced a flaw into the network that dealt with Ã¢â‚¬Å“routingÃ¢â‚¬Â and Ã¢â‚¬Å“fuckedÃ¢â‚¬Â the Ã¢â‚¬Å“decentralized data store aka DHTÃ¢â‚¬Â this in turn ran clients on a RANDOM search of Supernodes which at this point were well booted off of the network.
In the End:
It is a huge cycle, no matter how many bugs they Ã¢â‚¬Å“fixÃ¢â‚¬Â in the Ã¢â‚¬Å“central serversÃ¢â‚¬Â it will take many days for N nodes to become Supernodes so they can route X data from peer A to peer B. This is NOT minor, a fix to the centralized server code base to relay data to N Supernodes there is lack there of, resulting of a very segregate network. Right now there are approximatly 10,000 sub Skype networks instead of 1 Single Ã¢â‚¬Å“in syncÃ¢â‚¬Â network. When this Ã¢â‚¬Å“data store(see DHT) is in sync globally then the Skype network will be again STABLE.
I know this is very broad but, unless magically all of said nodes can recreate the Ã¢â‚¬Å“single overlay (DHT)Ã¢â‚¬Â then nothing will be in sync. You will see delayed messaged, delayed or incorrect profiles and presence.
My take, in the end is give it 48 more hours and it may be semi-stable, but hey this is what you get with using end users as your own redundancyÃ¢â‚¬Â¦