High-level Architecture

Scalable Salesforce Integrations on a Shoestring Budget – Step-by-Step Guide

Creating your Talend jobs

It would take me too much time to write down a step by step guide for this part so I’ll share a few resources for you to get started with Talend and I’ll share a zip file with a copy of the jobs I used for my demo, you would just need to import them in your Talend Open Studio and adjust the context or the components.

Resources:

Download and Import the project files

Download the files from here, then right-click on “Job Designs” and use the Import Items functionality:

Adjust the DF19 Context variables with your own database details and your
Case Safe Org ID (18-chars)

Talend job description

Click on the image to see a larger version

The two demo jobs have the same identical components and they just differ in the fields which are parsed and mapped to the database schema.

At the beginning (top-left), we open a shared connection to the database.
We then obtain the raw message from a tFixedFlowInput component using a single input field containing the context variable AWS_SQSmessage (which will be passed as an argument to the executable JAR).

The “Parse Accounts” component is a tExtractXMLField component which will read the XML contained in the input field and extract one row for each loop element (in our case each Notification tag):

All successfully parsed rows are sent forward to the “Filter OrgId” where we can check if the orgId contained in the messages is matching with the one we expect within this Context:

We can then map all fields to a relevant column in the database table: for simplicity I kept identical names (warning: PostgreSQL is case sensitive!)

tMap component from the Account sync job

Finally we store the changed data into the database using tDBOutput or tDBOutputBulkExec components.

NOTE:
Upsert operations are slower than “Insert” or “Update so I generally split them.
When you use tDBOutputBulkExec you can get up to 10x faster because it uses the COPY command of PostgreSQL.
tDBOutputBulkExec requires temporary CSV files to be created, Lambda functions allow you to write up to 512 MB in the /tmp/ folder.

How to test your jobs

You can test your Talend job by copying a message from the SQS queue and pasting it into the Context variable AWS_SQSmessage. I also use a lot of tLogRow components to see the data flow at every stage of the ETL process (you can deactivate the unneeded ones, if you don’t need some table to appear in your CloudWatch logs).

Build your Talend jobs

Talend allows you to export your jobs as executable JARs

Save the file to your preferred location:

Now we need to create a wrapper Java application that will import the Amazon SDK classes and will run the main method from our JAR.


Go to the next page to see how to do that.

Pages: 1 2 3 4 5


Posted

in

, ,

by

Tags:

Comments

Leave a Comment