A Guide to Text Recognition in React Native
Imagine converting handwritten documents or printed documents into digital text using your smartphone. Well, this capability is not just a dream but a reality within the world of React Native.
The journey started with experimentation with various third-party React Native text detection libraries and several machine learning tools, including ML Kit, TensorFlow, and Tesseract OCR.
After extensive exploration and research, the researchers found that incorporating the desired functionality of converting handwritten or printed text into digital text directly within React Native was not a straightforward task. They also attempted to find suitable JavaScript libraries for this purpose but couldn’t find any that showed significant promise or met their requirements.
Ultimately, we decided to go with native modules for both iOS and Android and successfully crafted an effective and efficient way to implement text detection from images in React Native using ML Kit.
To learn how to enable pictures to speak, follow these steps along with us.
In other words, let us guide you through the process of making images convey their content audibly.
1. Implement Image capture or image picker functionality
We have employed the react-native-image-picker library to handle image-related tasks. If your requirements call for different camera functionality, you are free to select an alternative library that suits your specific needs.
2. Implement TextDetectionModule for native Android
In your project-level build.gradle
file, and add Google's Maven repository.
repositories {
mavenLocal()
google()
mavenCentral()
}
Add the dependencies for the ML Kit Android libraries to your module’s app-level gradle file, which is usually located in the app/build.gradle.
dependencies {
// your code
implementation 'com.google.mlkit:text-recognition:16.0.0'
}
Now in the Android folder create the TextDetectionModule.java.
package com.textdetection;
import android.net.Uri;
import android.util.Log;
import androidx.annotation.NonNull;
import com.facebook.react.bridge.Arguments;
import com.facebook.react.bridge.Promise;
import com.facebook.react.bridge.ReactApplicationContext;
import com.facebook.react.bridge.ReactContextBaseJavaModule;
import com.facebook.react.bridge.ReactMethod;
import com.facebook.react.bridge.WritableArray;
import com.facebook.react.bridge.WritableMap;
import com.google.android.gms.tasks.OnFailureListener;
import com.google.android.gms.tasks.OnSuccessListener;
import com.google.mlkit.vision.common.InputImage;
import com.google.mlkit.vision.text.Text;
import com.google.mlkit.vision.text.TextRecognition;
import com.google.mlkit.vision.text.TextRecognizer;
import com.google.mlkit.vision.text.latin.TextRecognizerOptions;
import java.io.IOException;
public class TextDetectionModule extends ReactContextBaseJavaModule {
TextDetectionModule(ReactApplicationContext context) {
super(context);
}
@Override
public String getName() {
return "TextDetectionModule";
}
@ReactMethod
public void recognizeImage(String url, Promise promise) {
Log.debug("TextRecognitionModule", "Url: " + url);
Uri uri = Uri.parse(url);
InputImage image;
try {
image = InputImage.fromFilePath(getReactApplicationContext(), uri);
// When using Latin script library
TextRecognizer recognizer =
TextRecognition.getClient(TextRecognizerOptions.DEFAULT_OPTIONS);
recognizer.process(image)
.addOnSuccessListener(new OnSuccessListener<Text>() {
@Override
public void onSuccess(Text result) {
WritableMap response = Arguments.createMap();
WritableArray blocks = Arguments.createArray();
for (Text.TextBlock block : result.getTextBlocks()) {
WritableMap blockObject = Arguments.createMap();
blockObject.putString("text", block.getText());
blocks.pushMap(blockObject);
}
response.putArray("blocks", blocks);
promise.resolve(response);
}
})
.addOnFailureListener(
new OnFailureListener() {
@Override
public void onFailure(@NonNull Exception e) {
promise.reject("Create Event Error", e);
}
});
} catch (IOException e) {
e.printStackTrace();
}
}
}
Here, we’re passing both the application context and the file URI as arguments to the inputImage.fromFilePath()
method.
InputImage image = InputImage.fromFilePath(getReactApplicationContext(), uri);
Next, initialize the TextRecognizer
an object from ML Kit.
TextRecognizer recognizer = TextRecognition.getClient(TextRecognizerOptions.DEFAULT_OPTIONS)
Pass the image to the process
method:
recognizer.process(image)
.addOnSuccessListener(new OnSuccessListener<Text>() {
@Override
public void onSuccess(Text result) {
// Task completed successfully
}
})
.addOnFailureListener(
new OnFailureListener() {
@Override
public void onFailure(@NonNull Exception e) {
// Task failed with an exception
}
});
Finally, ML Kit provides you with the extracted text. When text recognition succeeds, it creates a structured response containing recognized text blocks.
public void onSuccess(Text result) {
WritableMap response = Arguments.createMap();
WritableArray blocks = Arguments.createArray();
for (Text.TextBlock block : result.getTextBlocks()) {
WritableMap blockObject = Arguments.createMap();
blockObject.putString("text", block.getText());
blocks.pushMap(blockObject);
}
response.putArray("blocks", blocks);
promise.resolve(response);
}
To summarize everything in Android, here is the full method recognizeImage
that we can utilize.
@ReactMethod
public void recognizeImage(String url, Promise promise) {
Uri uri = Uri.parse(url);
InputImage image;
try {
image = InputImage.fromFilePath(getReactApplicationContext(), uri);
// When using Latin script library
TextRecognizer recognizer =
TextRecognition.getClient(TextRecognizerOptions.DEFAULT_OPTIONS);
recognizer.process(image)
.addOnSuccessListener(new OnSuccessListener<Text>() {
@Override
public void onSuccess(Text result) {
WritableMap response = Arguments.createMap();
WritableArray blocks = Arguments.createArray();
for (Text.TextBlock block : result.getTextBlocks()) {
WritableMap blockObject = Arguments.createMap();
blockObject.putString("text", block.getText());
blocks.pushMap(blockObject);
}
response.putArray("blocks", blocks);
promise.resolve(response);
}
})
.addOnFailureListener(
new OnFailureListener() {
@Override
public void onFailure(@NonNull Exception e) {
promise.reject("Create Event Error", e);
}
});
} catch (IOException e) {
e.printStackTrace();
}
}
Implement TextDetectionModule for iOS
Add the specified ML Kit dependencies to your Podfile.
pod 'GoogleMLKit/TextRecognition', '3.2.0'
Similar to the Android approach, provide an image URL as input to your native method, which can be invoked from JavaScript. Then, create an MLKVisionImage object using an UIImage
or CMSampleBuffer
, ensuring that you set the .orientation
.
MLKVisionImage *visionImage = [[MLKVisionImage alloc] initWithImage:image];
visionImage.orientation = image.imageOrientation;
Following that, initialize a TextRecognizer
instance with the options related to the SDK, which we previously declared as a dependency.
MLKTextRecognizerOptions *latinOptions = [[MLKTextRecognizerOptions alloc] init];
MLKTextRecognizer *textRecognizer = [MLKTextRecognizer textRecognizerWithOptions:latinOptions];
Then, process the image by passing it to the process(_:completion:)
method:
[textRecognizer processImage:visionImage
completion:^(MLKText *_Nullable result,
NSError *_Nullable error) {
if (error != nil || result == nil) {
reject(@"text_recognition", @"text recognition is failed", nil);
return;
}
NSMutableDictionary *response= [NSMutableDictionary dictionary];
// detect text
resolve(response);
}];
Ultimately, extract text from blocks of recognized text. If the text recognition operation succeeds, it returns Text
as an object.
NSMutableDictionary *response = [NSMutableDictionary dictionary];
NSMutableArray *blocks = [NSMutableArray array];
for (MLKTextBlock *block in result.blocks) {
NSMutableDictionary *blockDict = [NSMutableDictionary dictionary];
[blockDict setValue: block.text forKey:@"text"];
[blocks addObject: blockDict];
}
[response setValue:blocks forKey:@"blocks"];
resolve(response);
With ML Kit, implemented the recognizeImage
native method for text detection in images for iOS.
RCTTextDetectionModule.m
#import <Foundation/Foundation.h>
#import "RCTTextDetectionModule.h"
#import <React/RCTLog.h>
#import <AVFoundation/AVFoundation.h>
#import <VisionKit/VisionKit.h>
#import <Vision/Vision.h>
@import MLKit;
@implementation RCTTextDetectionModule
RCT_EXPORT_MODULE();
RCT_EXPORT_METHOD(recognizeImage:(NSString *)url resolver:(RCTPromiseResolveBlock)resolve
rejecter:(RCTPromiseRejectBlock)reject)
{
RCTLogInfo(@"URL: %@", url);
NSURL *_url = [NSURL URLWithString:url];
NSData *imageData = [NSData dataWithContentsOfURL:_url];
UIImage *image = [UIImage imageWithData:imageData];
MLKVisionImage *visionImage = [[MLKVisionImage alloc] initWithImage:image];
visionImage.orientation = image.imageOrientation;
MLKTextRecognizerOptions *latinOptions = [[MLKTextRecognizerOptions alloc] init];
MLKTextRecognizer *textRecognizer = [MLKTextRecognizer textRecognizerWithOptions:latinOptions];
[textRecognizer processImage:visionImage
completion:^(MLKText *_Nullable result,
NSError *_Nullable error) {
if (error != nil || result == nil) {
reject(@"text_recognition", @"text recognition is failed", nil);
return;
}
NSMutableDictionary *response= [NSMutableDictionary dictionary];
NSMutableArray *blocks = [NSMutableArray array];
for (MLKTextBlock *block in result.blocks) {
NSMutableDictionary *blockDict = [NSMutableDictionary dictionary];
[blockDict setValue: block.text forKey:@"text"];
[blocks addObject: blockDict];
}
[response setValue:blocks forKey:@"blocks"];
resolve(response);
}];
}
@end
Access Native Modules in React Native side
Now that we’ve finished implementing native methods, we can access and utilize them in our React Native application with NativeModules
.
import { NativeModules } from 'react-native';
const { TextDetectionModule } = NativeModules;
export const recognizeImage = (url: string) => {
return TextDetectionModule.recognizeImage(url);