Content Marketing

What can we learn from the Yandex source code leak?

On the 27th of January 2023, it was first reported that Russia’s largest search engine, Yandex, experienced a tech…

POST WRITTEN BY: Alisha Taylor

On the 27th of January 2023, it was first reported that Russia’s largest search engine, Yandex, experienced a tech error that caused the engine’s source code to be leaked to the public for the first time. Whilst this is, obviously, a big deal for Yandex themselves, the question has arisen as to what we can learn from the inner workings of this search engine, and what this might tell us about other major search engines – including Google.

Breaking down the data, Alexander Buraks of DiscoverCars.com shared his findings in a Twitter thread that has now reached over one million views.

So, what are the key similarities between Yandex and Google in terms of their algorithm?

There is a RankBrain analogue – MatrixNet
They are using PageRank – almost identical to Google
Many of their text algorithms are the same

Whilst there are still some differences between the two platforms, Buraks points out that there are many former-’Googlers’ working for Yandex, and that the search engine was originally built to be a Google clone – suggesting that their similarities may outweigh the differences.

In his tweet, Buraks states “In practice: comparing Google vs Yandex search results they are a ~70% match.”

If this is the case, there is potentially a lot that can be learnt from this Yandex leak in terms of building better practices for Google search optimization specifically. For those of us in the PR field, this could essentially be the Enigma Code of SEO.

Alex Buraks continues to break down the data from the leak, stating that the age of the links is a ranking factor in Yandex’s algorithm. Similarly, the age of the document and the date of the most-recent update are also ranking factors.

Additionally, traffic and the percentage of organic traffic also help to determine the ranking. This confirms that those paying for PPC (pay-per-click) are likely to harm their ranking in the long-term.

Buraks goes on to say that URLs containing numbers will not do as well in the rankings, alongside URLs containing an excessive amount of slashes. Host reliability is another determining factor for rankings, meaning that websites containing less 40x/50x errors will perform better in Yandex search results and garner more organic traffic.

Interestingly, the algorithm at Yandex contains a specific ranking factor just for uplifting Wikipedia. This means that pages containing links to Wikipedia entries, along with Wikipedia entries themselves, will perform better on the algorithm. Additionally, traffic coming from Wikipedia is another ranking factor, meaning that those being cited as sources for Wikipedia entries will fare better on Yandex’s algorithm.

In an additional Twitter thread, Buraks continues to analyse data from the source code leak. He reports that important pages should be accessible within 3 clicks, and that backlinks from main pages are actually more important than backlinks within internal pages. This is particularly important for those regularly utilising blogs and other internal content to boost traffic elsewhere on-site.

Unsurprisingly, the number of search queries for your site’s URL is a ranking factor, with higher volumes of searches equating to a better ranking position. Another unsurprising factor is the use of keywords within the URL itself, alongside the copy. However, interestingly, the algorithm also takes the percentage of capital letters within the title of the web content into consideration, which Buraks notes with the following:

#12 Percentage of CAPITAL LETTERS in <title> is a ranking factor.

Really, how ofter do you see fully capitalized titles for website in top of Google? pic.twitter.com/DH86dXwNGn

— Alex Buraks (@alex_buraks) January 28, 2023

With short-form media taking the world by storm in recent years, with the likes of TikTok and YouTube Shorts soaring in popularity, there is also a ranking factor specifically for short videos. Similarly, embedded videos within a web page are favourable to the algorithm. However, on the other hand, broken embedded videos will harm search engine rankings.

We caught up with Alexander to get his thoughts on the matter.

Commenting on the leak earlier this week, Alexander Buraks said:

“The analysis that I posted on Twitter is quite simple and superficial. I spent a few days researching the leaked source code before posting it, but right now I understand that it is only a small glimpse of the full picture. The most interesting thing for me was that we got the answer to the majority of our guesses.”

Alexander continues, “Basically, we were able to determine that almost everything that could potentially/logically be a ranking factor is an actual ranking factor – at least in Yandex. There was a lot of discussion about how different/similar Yandex vs Google are in terms of search results and principles of work. Of course, Yandex is not a full copy of Google. But, by comparing SERPs and analysing work principles, everyone can find that there are a lot of similarities.”

Final thoughts

If we are to believe that many of these factors carry over into Google’s very own search engine algorithm, then the Yandex leak may have just unintentionally revealed all of the tools required to get your content performing very well indeed within Google search results. However, it’s impossible to say for sure just how similar this source code is to Google’s, and it’s extremely unlikely that they’ll be forthcoming with that information any time soon.

What we can definitely take away from this leak, however, is that these factors are likely to play at least some part in determining the ranking of your content within search results. Whether they’re as important to Google as they are to Yandex will remain a mystery, but it’s unlikely to do any harm in experimenting with the factors revealed during this leak and looking out for SEO improvements.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Content Marketing

What can we learn from the Yandex source code leak?

POST WRITTEN BY: Alisha Taylor

Final thoughts

Let's shout about your business

Let's keep it casual

Follow us across social