Writing Transforms in Python

Full Series

Part 1 - The Default Transform & Working with Records
Part 2 - Reading from Records
Part 3 - Writing to Records
Part 4 - Removing Fields from a Record
Part 5 - Working with Strings in Python

Reading from Records

There are two main ways to access the individual values (or “fields”) of a Record:

Indexing (a.k.a. Subscript Notation)

We can do direct lookups on record using square brackets, and the name of the field we want to access, which will always be stored as a case-sensitive string. For example, to look up a field called “Username”, and assign it to a variable:

username = record["Username"]

This is very simple, but will raise an Exception (essentially, cause an Error, stopping your Transform) if the referenced field doesn’t exist. This isn’t necessarily a bad thing, but it is a tradeoff: If you expect a field to exist, and the absence of that field represents a problem, this syntax will make that very clear, and prevent your Transform from running without it!

The get() Method

If a field is optional, or is otherwise not expected to be present in every record, the "get” method is a convenient option:

username = record.get("Username")

# If record has no "Username" field, username will be left as None

If “Username” is available, this behaves the same as the square bracket syntax, above. However, if the “Username” field isn’t available on a record, there won’t be an error - It will return None instead. (This is the Python equivalent of NULL - Don’t worry too much about the distinction, as None values will automatically become NULL values if written out to a JSON file, SQL Database, or other non-Python service.)

💡

If you aren’t familiar, Python’s None (and JSON’s NULL) are special values used to indicate a "lack of data" - When actual data is missing, unavailable, or hasn’t arrived yet...

The “get” method also allows us to specify a default, by adding a second argument:

username = record.get("Username", "(unknown?)")

# If record has no "Username" field, username will be set to "(unknown?)"

This changes the behavior when the “Username” field is missing from a record: Instead of returning None, it now returns the default value “(unknown?)" instead. This is very useful if we want to enforce a default value for a field that isn’t always available, or otherwise improve the consistency of our data.

A Customized Approach

If you need to do something more complicated when a field isn’t present, you can combine the “get” method with some additional logic:

username = record.get("Username")
if username is None:
    # Do something more interesting here!

We can attempt to get the value of “Username” from record - But if it’s not available, our username variable will be set to None. We can then check for this using a conditional “if” statement, specifically determining if username “is” None: In this case, we can implement more logic to determine what the “username” variable should be set to.

💡

Python has a few different ways to compare things:

== checks for equality: if two things are considered to be equal.

is checks for identity: If two things are “exactly the same thing”, b equivalent to each other.

This is important, because other things in Python can be equivalent to None, but our condition if username is None will only be True if username is exactly, specifically None.

A Slight Shortcut using “in”

If we just want to know whether or not a field is present, without trying to read its value or assign it to a variable, we can use the in operator:

if "Username" in record:
    # Do something that relies on record["Username"] here...

(This checks to see if the "Username" key is present in the record dictionary, without trying to access it, or retrieve the value.)

Sometimes it may be helpful to check for the opposite: If the “Username” field is not present:

if "Username" not in record:
    # Do something if record["Username"] is missing here...

This is good for two reasons: It's less code, making our Transform shorter and easier to read. And it more clearly expresses our intentions, which will be helpful to other people who are trying to understand this Transform. (Which may include you, in the future!)

Given what we’ve learned: Could a Transform filter out records without a “Username”?

Click here to see the answer

Yes, see below:

def transform(records):
    for dataset_id, record_id, record in records:
        if "Username" in record:
            yield dataset_id, record_id, record
        # (otherwise, we won't yield the record, filtering it out...)

Continue reading part 3 of this series here, where we dive into writing to records.

Writing Transforms in Python - Part 2

Full Series

Reading from Records

Indexing (a.k.a. Subscript Notation)

The get() Method

A Customized Approach

A Slight Shortcut using “in”

Share This Post

Check out these related posts

Writing Transforms in Python - Part 5

Writing Transforms in Python - Part 4

Writing Transforms in Python - Part 3