Almost two years ago I wrote a post describing how to translate text using Azure cognitive services, however the API it uses is to be switched off and so I needed to migrate from the version 2 API to version 3.
Whilst most of the code I post on this blog is used in one form
or another, I've been using the
presented in that article as-is for the past two years. OK, I
changed the namespace. But otherwise it's identical.
Although I have finally stopped using older classes such as
HttpWebRequestin favour of
await, so far I haven't updated existing code to make use of them. As I noted above, I'm still using the
TranslationClientclass introduced in my previous blog post and at this time I simply want to retrofit the class to use the V3 API.
This also means I'm still not using any of the extra features offered by the API even though it probably makes more sense to combine some of the functionality now (as Microsoft have done with the API's themselves), however as I want the new class to be a drop in replacement for the old I have left this as an exercise for a future blog post
The official migration documentation can be found on Microsoft's site.
At first glance, the biggest change between v2 and v3 is the output format. Previously it was XML, now JSON. This is a bit of a double edged sword as while JSON is the standard these days, XML parsing is built into the .NET framework and JSON is not (yet).
JSON.net is a fine library for working with JSON, but
thanks to the way NuGet works it quickly spread like a plague
though my application libraries, and so I ended up blanket
purging it. Instead, for some time I've been using a modified
version of the fantastic PetaJson which is a single
file I embed in any projects that require JSON support.
The switch from XML to JSON does mean that a reference to
System.Runtime.Serialization is no longer required which is a
I'm already only using a limited subset of functionality via three separate version 2 API's. In version 3, two of these have been consolidated into one. The following table outlines the different endpoints
|v2 Method||v3 Method|
In addition, the base URI has changed from
Although you can simply use the default base URI above and have Azure choose an appropriate data centre, you can optionally specify a specific region as follows.
All requests to the API (apart from the initial authentication)
need to include the
api-version query parameter, although
currently the only supported value is
3.0. Failure to include
this will result in a
400 status code along with a body
similar to the following
I'm using the same code to obtain an authentication token as I was for the version 2 API, as far as I know this isn't going to be removed - please see the original article for details.
According to the reference instead of generating an access token from your API key, you can pass the key directly via the
Ocp-Apim-Subscription-Key. Given that this was also supported in the v2 API I'm not sure why I choose the more convoluted method of generating an access token, something else to potentially refactor away in a future update, especially given the fact that exact code has had a bug in it for over two years now.
Imagine my surprise when the first thing that happened after changing URI constants was the program crashed in a place I wasn't expecting! As it turns out, there was a bug in the original code and which just happened to have worked up until now.
When requesting an API token, the token is the body of the
response. The class has a private
method for pulling this out (and incidentally is also useful for
debugging purposes). This method checks to see if a character
set is defined on the
HttpWebResponse (via the
property) and if so uses that to read text appropriately,
otherwise falls back to UTF-8.
At least, that was the theory. In reality, if a character set is present UFT-8 is always used, and if not present it tries to use the null object and crashes. ReSharper very helpfully warns you of this very thing with its "Possible 'null' assignment to entity marked with 'NotNull' attribute" warning, and I completely ignored as I'm so used to seeing it with various file API's that evidently I treat it as noise without paying attention.
Oops. Well, it's fixed now!
GetLanguagesForTranslate API has been replaced with
languages and rather than returning a simple list of language
codes, it now returns a little bit more - at the most basic
level it includes the name (native and localised) and the
scope query parameter, you specify a comma separated
list of group information to return. The available group names
dictionary. As I'm
only interested in translations, that is the only scope I'll
provide. As an aside, if you omit this parameter it will act as
if you had specified all scopes.
GetLanguages function changes to this
I have to admit, I'm not a fan of this awful "dictionary of
dictionary of dictionaries" nonsense. But at the
element is an object with language codes as property names
rather than an array, offhand I'm not sure how I'd get that
converted into a strongly typed keyed collection, regardless of
if using PetaJson or JSON.net - I will be revisiting this in a
I'm also not a fan of having to load the entire JSON string into
parsed objects and then discard most of it. PetaJSON has a
Reader class which behaves very much like
ideally I should have used that to walk the JSON.
In the above code, I've left in place the obtaining and setting an authentication token. However, unlike the v2 API, authentication is not required for using the
/languagesAPI. It is still required for actions that requiring billing, such as the
As I've laboriously noted above, in the v3 API, Microsoft
combined the original
GetLanguageNames into a single API call and so getting the
actual names for each language is a simple case of taking the
above code and pulling out a little more information from the
Remembering that the JSON output includes
dir attributes; this time around, we're interested in
pulling out the
name field. This is the display name in the
requested locale (
nativeName is the display name in the locale
of the language itself). But how do you specify the requested
locale? In v2, you used the
locale query parameter but for v3
it is done by setting the
There's also another important difference - with the v2 API, you
POST and the body had a list of the languages for which
you wanted localised names for. However, for v3 there is no such
filtering available, it will return localised names for all
As I'm trying to keep the same behaviour that means I'm going to need to add this filtering myself (although by the time I'd finished this article I was questioning my reasoning for not just rewriting the class from scratch in a modern fashion and forcing our internal application deal with it).
I really don't like this code. Too late for second guessing now though!
The final part of this migration exercise is the actual text translation. Again, there's some small differences from v2 but nothing too troublesome.
Firstly, the text to translate is no longer a query parameter,
but part of the body text as a JSON object. This makes sense in
a way as for v3, Microsoft merged the
TranslateArray API's into one. But it still means it's slightly
more awkward to use.
The body JSON is simple enough and looks like this
Note that for some reason the
Textattribute is in title case rather than lower case in all the other examples
The language to convert from and to are still specified via the
to query parameters as with v2.
The response is a JSON array, similar to the following.
However, it can include a great deal more information depending on if you use auto detection, transliteration and more. I'm not covering any of that here in my 1:1 conversion.
As I don't really want to manually write JSON and deal with having to escape text, I'll create an interim object and use PetaJson to write it out. I've made it private for now as it is only used inside of this method. It was also at this point I threw up my hands in disgust at more dictionary of dictionaries and wrote a few limited POCO's for the response output that I'm interested in.
With the helpers in place, I can now expand the
method to work with the v3 API
Much more complicated than the previous version! Still it works. Doesn't it?
After I had the conversion complete, I noticed that one of the variations of Klingon wasn't listed in the language list any more. Curious, I ran the original application and back it popped. At first I thought they might have been combined with the new script support but this doesn't seem to be the case. Fortunately, no user has asked for our software to be in Klingon, so I can ignore this omission!
I also noted the codes for Chinese have changed - in v2 they are
zh-CHS (Simplified) and
zh-CHT (Traditional), but in v3 they
zh-Hant. Apparently the latter is the
proper way of doing things now, but this a breaking change for
me as various shell scripts and data files refer to the old
style and will need changing.
Even more oddly however, the first part of the "Major-General's Song" that defaults in the demonstration program now translates differently in the two versions
German Translation (version 2 API):
German Translation (version 3 API):
I have no idea as to why this is, I assume it's because according to the documentation it uses "neural machine translation by default", although it doesn't seem to state how to disable it.
In the end, I updated the demonstration program to include both the v2 and v3 classes so you I could toggle between them to easily see the differences.
Attached to this post is an upgraded demonstration project which is a little more robust than the methods above, it is also available on our GitHub page. Note that you will need to use your own API key, the one in the demonstration program has been invalidated.
I'm really not a fan of the new code and have made a note on my blog Todo list to revisit this topic in the future and rewrite it properly using modern techniques, and also to investigate some of the additional functionality the translation API offers.
- 2019-04-11 - First published
- 2020-11-22 - Updated formatting
Like what you're reading? Perhaps you like to buy us a coffee?