Query Construction

Query Syntax and Execution

Two syntaxes are available for constructing queries: an “operator” syntax using Python’s comparators, and a “fluent” syntax where terms are chained together. Which to use is a matter of preference, and both construct the same query object.

Operator Syntax

Searches are built up from a series of Terminal nodes, which compare structural attributes to some search value. In the operator syntax, Python’s comparator operators are used to construct the comparison. The operators are overloaded to return Terminal objects for the comparisons.

Here is an example from the RCSB PDB Search API page created using the operator syntax. This query finds symmetric dimers having a twofold rotation with the DNA-binding domain of a heat-shock transcription factor.

from rcsbsearchapi.search import TextQuery
from rcsbsearchapi import rcsb_attributes as attrs

# Create terminals for each query
q1 = TextQuery("heat-shock transcription factor")
q2 = attrs.rcsb_struct_symmetry.symbol == "C2"
q3 = attrs.rcsb_struct_symmetry.kind == "Global Symmetry"
q4 = attrs.rcsb_entry_info.polymer_entity_count_DNA >= 1

Attributes are available from the rcsb_attributes object and can be tab-completed. They can additionally be constructed from strings using the Attr (attribute) constructor.

List of supported comparative operators:

Operator

Description

==

is

!=

is not

>

greater than

>=

greater than or equal to

<

less than

<=

less than or equal to

in

contains phrase or contains words

To use the exists operator, create an AttributeQuery

For methods to search and find details on attributes within this package, go to the attributes page For a full list of attributes, please refer to the RCSB PDB schema.

Individual Terminals are combined into Groups using python’s bitwise operators. This is analogous to how bitwise operators act on python set objects. The operators are lazy and won’t perform the search until the query is executed.

query = q1 & (q2 & q3 & q4)  # AND of all queries

AND (&), OR (|), and terminal negation (~) are implemented directly by the API, but the python package also implements set difference (-), symmetric difference (^), and general negation by transforming the query.

List of supported bitwise operators:

Operator

Description

&

AND

|

OR

~

NOT

^

XOR/symmetric difference

-

set difference

Queries are executed by calling them as functions. They return an iterator of result identifiers.

# Call the query to execute it
results = query()

for rid in results:
    print(rid)

By default, the query will return “entry” results (PDB IDs). It is also possible to query other types of results (see return-types for options):

# Set return_type to "assembly" when executing
results = query(return_type="assembly")

for assembly_id in results:
    print(assembly_id)

Fluent Syntax

The operator syntax is great for simple queries, but requires parentheses or temporary variables for complex nested queries. In these cases the fluent syntax may be clearer. Queries are built up by appending operations sequentially.

Here is the same example using the fluent syntax

from rcsbsearchapi.search import TextQuery, AttributeQuery, Attr

# Start with a Attr or TextQuery, then add terms
results = TextQuery("heat-shock transcription factor").and_(
    # Add attribute node as fully-formed AttributeQuery
    AttributeQuery(
        attribute="rcsb_struct_symmetry.symbol",
        operator="exact_match",
        value="C2"
    )

    # Add attribute node as Attr with chained operations
    # Setting type to "text" specifies that it's a Structure Attribute
    .and_(Attr(
        attribute="rcsb_struct_symmetry.kind",
        type="text"
    )).exact_match("Global Symmetry")

    # Add attribute node by name (converted to Attr) with chained operations
    .and_("rcsb_entry_info.polymer_entity_count_DNA").greater_or_equal(1) \

    # Execute the query and return assembly ids
    ).exec(return_type="assembly")

# Exec produces an iterator of IDs
for assembly_id in results:
    print(assembly_id)

Grouping Sub-Queries

Grouping of Structural Attribute and Chemical Attribute queries is permitted. More details on attributes that are available for attribute searches can be found on the RCSB PDB Search API page.

from rcsbsearchapi.search import AttributeQuery

# Query for structures determined by electron microscopy
q1 = AttributeQuery(
    attribute="exptl.method",
    operator="exact_match",
    value="electron microscopy"
)

# Drugbank annotations contain phrase "tylenol"
q2 = AttributeQuery(
    attribute="drugbank_info.brand_names",
    operator="contains_phrase",
    value="tylenol"
)

# Combine queries with AND
query = q1 & q2

list(query())

Sessions

The result of executing a query (either by calling it or using exec()) is a Session object. It implements __iter__, so it is usually treated just as an iterator of IDs.

Paging is handled transparently by the session, with additional API requests made lazily as needed. The page size can be controlled with the rows parameter.

first = next(iter(query(rows=1)))

Progress Bar

The Session.iquery() method provides a progress bar indicating the number of API requests being made. It requires the tqdm package be installed to track the progress of the query interactively.

results = query().iquery()

Search Service Types

The list of supported search service types are listed in the table below.

Search service

QueryType

Full-text

TextQuery()

Attribute (structure or chemical)

AttributeQuery()

Sequence similarity

SequenceQuery()

Sequence motif

SequenceMotifQuery()

Structure similarity

StructSimilarityQuery()

Structure motif

StructMotifQuery()

Chemical similarity

ChemSimilarityQuery()

Learn more about available search services on the RCSB PDB Search API docs.

Request Options

Return Types

A search query can return different result types when a return type is specified. Below are Structure Attribute query examples specifying return types Polymer Entities, Non-polymer Entities, Polymer Instances, and Molecular Definitions. More information on return types can be found in the RCSB PDB Search API page.

from rcsbsearchapi.search import AttributeQuery

# query for 4HHB deoxyhemoglobin
q1 = AttributeQuery(
    attribute="rcsb_entry_container_identifiers.entry_id",
    operator="in",
    value=["4HHB"]
)

# Polymer entities
for poly in q1(return_type="polymer_entity"):
    print(poly)
    
# Non-polymer entities
for nonPoly in q1(return_type="non_polymer_entity"):
    print(nonPoly)
    
# Polymer instances
for polyInst in q1(return_type="polymer_instance"):
    print(polyInst)
    
# Molecular definitions
for mol in q1(return_type="mol_definition"):
    print(mol)

Computed Structure Models

The RCSB PDB Search API page provides information on how to include Computed Structure Models (CSMs) into a search query. Here is a code example below.

This query returns IDs for experimental and computed structure models associated with “hemoglobin”. Queries for only computed models or only experimental models can also be made (default).

from rcsbsearchapi.search import TextQuery

q1 = TextQuery(value="hemoglobin")

# add parameter as a list with either "computational" or "experimental" or both
q2 = q1(return_content_type=["computational", "experimental"])

list(q2)

Results Verbosity

Results can be returned alongside additional metadata, including result scores. To return this metadata, set the results_verbosity parameter to “verbose” (all metadata), “minimal” (scores only), or “compact” (default, no metadata). If set to “verbose” or “minimal”, results will be returned as a list of dictionaries.

For example, here we get all experimental models associated with “hemoglobin”, along with their scores.

from rcsbsearchapi.search import TextQuery

q1 = TextQuery(value="hemoglobin")
for idscore in list(q1(results_verbosity="minimal")):
    print(idscore)

Count Queries

If only the number of results is desired, the count request option can be used. This query returns the number of experimental models associated with “hemoglobin”.

from rcsbsearchapi.search import TextQuery

q1 = TextQuery(value="hemoglobin")

result_count = q1(return_counts=True)
print(result_count)

Faceted Queries

In order to group and perform calculations and statistics on PDB data by using a simple search query, you can use a faceted query (or facets). Facets arrange search results into categories (buckets) based on the requested field values. More information on Faceted Queries can be found here. All facets should be provided with name, aggregation_type, and attribute values. Depending on the aggregation type, other parameters must also be specified. To run a faceted query, create a Facet object and pass it in as a single object or list into the facets argument during query execution.

from rcsbsearchapi.search import AttributeQuery, Facet, Range

q = AttributeQuery(
    attribute="rcsb_accession_info.initial_release_date",
    operator="greater",
    value="2019-08-20",
)

q_result = q(facets=Facet(
    name="Methods",
    aggregation_type="terms",
    attribute="exptl.method"
))
print(q_result.facets)

List of available types of Faceted queries:

  • Terms Facet

  • Histogram Facet

  • Range Facet

  • Date Range Facet

  • Cardinality Facet

  • Multidimensional Facet

  • Filter Facet

See example usage of each of these types of Faceted queries at Faceted Query Examples.

Additional Request Options

Other request options can also be added to queries through arguments at execution. facet, group_by, and sort are more complex request_options and require creating a RequestOption object (Facet, GroupBy, Sort).

List of available request options:

  • results_content_type

  • results_verbosity

  • return_counts

  • facets

  • group_by

  • group_by_return_type

  • sort

  • return_explain_metadata

  • scoring_strategy

Some request options are currently not implemented:

  • paginate: automatically handled by package. Results are paginated by package and all results are returned

  • return_all_hits: not implemented since all results are returned

For more information on what each request option does, refer to the Search API documentation.

For information on how to create RequestOption objects, see the API reference.