How to Use the Google Cloud Vision API in Android Apps

Computer vision is considered an AI-complete problem. In other words, solving it would be equivalent to creating a program that’s as smart as humans. Needless to say, such a program is yet to be created. However, if you’ve ever used apps like Google Goggles or Google Photos—or watched the segment on Google Lens in the keynote of Google I/O 2017—you probably realize that computer vision has become very powerful.

Through a REST-based API called Cloud Vision API, Google shares its revolutionary vision-related technologies with all developers. By using the API, you can effortlessly add impressive features such as face detection, emotion detection, and optical character recognition to your Android apps. In this tutorial, I’ll show you how.


To be able to follow this tutorial, you must have:

  • a Google Cloud Platform account
  • a project on the Google Cloud console
  • the latest version of Android Studio
  • and a device that runs Android 4.4 or higher

If some of the above requirements sound unfamiliar to you, I suggest you read the following introductory tutorial about the Google Cloud Machine Learning platform:

  • Android SDK
    How to Use Google Cloud Machine Learning Services for Android
    Ashraff Hathibelagal

1. Enabling the Cloud Vision API

You can use the Cloud Vision API in your Android app only after you’ve enabled it in the Google Cloud console and acquired a valid API key. So start by logging in to the console and navigating to API Manager > Library > Vision API. In the page that opens, simply press the Enable button.

Enable Cloud Vision API

If you’ve already generated an API key for your Cloud console project, you can skip to the next step because you will be able to reuse it with the Cloud Vision API. Otherwise, open the Credentials tab and select Create Credentials > API key.

Create API key

In the dialog that pops up, you will see your API key.

2. Adding Dependencies

Like most other APIs offered by Google, the Cloud Vision API can be accessed using the Google API Client library. To use the library in your Android Studio project, add the following compile dependencies in the app module’s build.gradle file:

Furthermore, to simplify file I/O operations, I suggest you also add a compile dependency for the Apache Commons IO library.

Because the Google API Client can work only if your app has the INTERNET permission, make sure the following line is present in your project’s manifest file:

3. Configuring the API Client

You must configure the Google API client before you use it to interact with the Cloud Vision API. Doing so primarily involves specifying the API key, the HTTP transport, and the JSON factory it should use. As you might expect, the HTTP transport will be responsible for communicating with Google’s servers, and the JSON factory will, among other things, be responsible for converting the JSON-based results the API generates into Java objects. 

For modern Android apps, Google recommends that you use the NetHttpTransport class as the HTTP transport and the AndroidJsonFactory class as the JSON factory.

The Vision class represents the Google API Client for Cloud Vision. Although it is possible to create an instance of the class using its constructor, doing so using the Vision.Builder class instead is easier and more flexible.

While using the Vision.Builder class, you must remember to call the setVisionRequestInitializer() method to specify your API key. The following code shows you how:

Once the Vision.Builder instance is ready, you can call its build() method to generate a new Vision instance you can use throughout your app.

At this point, you have everything you need to start using the Cloud Vision API.

4. Detecting and Analyzing Faces

Detecting faces in photographs is a very common requirement in computer vision-related applications. With the Cloud Vision API, you can create a highly accurate face detector that can also identify emotions, lighting conditions, and face landmarks.

For the sake of demonstration, we’ll be running face detection on the following photo, which features the crew of Apollo 9:

Sample photo for face detection

I suggest you download a high-resolution version of the photo from Wikimedia Commons and place it in your project’s res/raw folder.

Step 1: Encode the Photo

The Cloud Vision API expects its input image to be encoded as a Base64 string that’s placed inside an Image object. Before you generate such an object, however, you must convert the photo you downloaded, which is currently a raw image resource, into a byte array. You can quickly do so by opening its input stream using the openRawResource() method of the Resources class and passing it to the toByteArray() method of the IOUtils class.

Because file I/O operations should not be run on the UI thread, make sure you spawn a new thread before opening the input stream. The following code shows you how:

You can now create an Image object by calling its default constructor. To add the byte array to it as a Base64 string, all you need to do is pass the array to its encodeContent() method.

Step 2: Make a Request

Because the Cloud Vision API offers several different features, you must explicitly specify the feature you are interested in while making a request to it. To do so, you must create a Feature object and call its setType() method. The following code shows you how to create a Feature object for face detection only:

Using the Image and the Feature objects, you can now compose an AnnotateImageRequest instance.

Note that an AnnotateImageRequest object must always belong to a BatchAnnotateImagesRequest object because the Cloud Vision API is designed to process multiple images at once. To initialize a BatchAnnotateImagesRequest instance containing a single AnnotateImageRequest object, you can use the Arrays.asList() utility method.

To actually make the face detection request, you must call the execute() method of an Annotate object that’s initialized using the BatchAnnotateImagesRequest object you just created. To generate such an object, you must call the annotate() method offered by the Google API Client for Cloud Vision. Here’s how:

Step 3: Use the Response

Once the request has been processed, you get a BatchAnnotateImagesResponse object containing the response of the API. For a face detection request, the response contains a FaceAnnotation object for each face the API has detected. You can get a list of all FaceAnnotation objects using the getFaceAnnotations() method.

A FaceAnnotation object contains a lot of useful information about a face, such as its location, its angle, and the emotion it is expressing. As of version 1, the API can only detect the following emotions: joy, sorrow, anger, and surprise.

To keep this tutorial short, let us now simply display the following information in a Toast:

  • The count of the faces
  • The likelihood that they are expressing joy

You can, of course, get the count of the faces by calling the size() method of the List containing the FaceAnnotation objects. To get the likelihood of a face expressing joy, you can call the intuitively named getJoyLikelihood() method of the associated FaceAnnotation object. 

Note that because a simple Toast can only display a single string, you’ll have to concatenate all the above details. Additionally, a Toast can only be displayed from the UI thread, so make sure you call it after calling the runOnUiThread() method. The following code shows you how:

Leave a Comment

Scroll to Top