Cloud Video Processing: Street Signals Detection and Recognition

Monteiro Del Prete
4 min readApr 9, 2021


I’d share the idea behind the project that I made with Teoresi Group company for my Bachelor’s Degree in Computer Science. I think every experience could be helpful despite of its complexity or real practical utility. Thus I hope it could help somebody, somewhere, someday.

What is present :

  • Architecture illustrations and analysis
  • Brief description of an image classifier

What is not present:

  • Code snippet
  • V2X technologies
  • Soft-real time validation and local testing of a serverless application
  • Communication privacy solutions
  • Cloud resources allocation

Finally I’m not going to follow the line of my thesis that would be heavily academic, but to analyze the underlying architecture in order to share a main idea and, eventually (hoping), a start point for somebody. That’s a content map:

a) Objective

b) Possible solution

c) Architecture

a) You got an input video which represents a street’s profile acquisition and you want to analyze it in order to verify the presence of street signals. Once done, imaging there’s some, the reprocessed sequence of frames must have a recognition area on them. For example, for one frame the result should be:

Thus we don’t care about who send us the video and who will take it in order to show it on vehicle, we’re focusing on what is going on internally to this communication. In other words during the development we can simulate the client who make the acquisition and the one who aspects the information. The main idea was to create a network whose aim is to get the video, detect possible“Men at Work” signals, and notify its presence to connected client in order to promote an adaptive routing.

b) There’re a lot of solution for that, but what we want is to be independent from underline hardware, thus redefine on demand the computational power. In particular our network is a serverless application located on cloud that every time we send a video, we got a response in a soft-real time way. What we need is a combination of specific cloud services in a manner that makes possible this aim.

c) We select a suite of AWS Cloud services according to three main needs:

  1. data storing
  2. data computing
  3. data communication

Where with the term “data” we mean the video’s unit, the frames. On this baseline we dispose some cloud services in this way:

Cloud Services Architecture

The red arrows represents the inter-services communication, the grey one are V2X applications on board that request and send data (away from our scope). Let’s describe the workflow (the listing number indicates the arrow number in the architecture):

  1. the first car does the street profile acquisition (through an on board cam for example) and send it to a S3 bucket formerly created, input_bucket (“first car” means a V2X application that we can simply simulate as a script that capture video from a PC webcam).
  2. The video loading on S3 triggers a lambda function, the principal actor of this architecture, that represents the classifier whose aim is to compute the video, analyze it and give the response in the form of the same video but with the signal detection on it.
  3. After the analysis, the lambda loads the processed video into a different S3 bucket, output_bucket.
  4. In order to notify the passive clients waiting for this kind of information, the lambda function rely on IoT Core. This service is based on MQTT protocol who follows publish-subscribe paradigm. Thus after configured “send topics” and “listening topics”, lambda can notify through these the analysis completion.
  5. The message arrives to passive client in a specific form knows a priori (DENM) containing informations like: what is the API for output_bucket, where’s the location of the signal (where the active client got the street profile acquisition) and what is the name of the processed file and so on.
  6. Thanks to these informations the passive client (V2X application) can download from S3 the processed video and, for example, show it on board to the drivers or show the signal position on navigator.

The services S3 and IoT Core are easy to configure, for the first one it’s enough to create an input_bucket (receives the video to process) and an output_bucket (receives the processed video). For the second one must be defined a suite of topics dedicated to notification and a message structure. The Lambda function is our classifier combined with other tools, whose aim is to detect and to recognize signals.

You can obtain a classifier as a black box or as a thing that you have to define. Personally and summarily, I’ve used an Haar Cascade Classifier (HCC) for detection, SURF algorithm for feature extraction and BF matcher for feature matching (Python libraries). The classifier has been trained for “Men at Work” signals and adapted to a lambda function.

Thus the idea is to unpack the video and apply the classifier to every single frame. This is processed with lambda function in this way:

  • object detection: HCC detects all the object it has been trained for (“Men at Work” signals).
  • object recognition: SURF algorithm executes a feature extraction from the detected object and from the reference image. Then a feature matching through BF matcher is performed. If the match overcomes a threshold, the frame contains the signal of interest who will be marked with a blue rectangle.

Doing this steps for every frame in the video and recreating the correct sequence will represent the output processed video.