# Query Construction ## Query Syntax and Execution Two syntaxes are available for constructing queries: an "operator" syntax using Python's comparators, and a "fluent" syntax where terms are chained together. Which to use is a matter of preference, and both construct the same query object. ### Operator Syntax Searches are built up from a series of `Terminal` nodes, which compare structural attributes to some search value. In the operator syntax, Python's comparator operators are used to construct the comparison. The operators are overloaded to return `Terminal` objects for the comparisons. Here is an example from the [RCSB PDB Search API](http://search.rcsb.org/#search-example-1) page created using the operator syntax. This query finds symmetric dimers having a twofold rotation with the DNA-binding domain of a heat-shock transcription factor. ```python from rcsbsearchapi.search import TextQuery from rcsbsearchapi import rcsb_attributes as attrs # Create terminals for each query q1 = TextQuery("heat-shock transcription factor") q2 = attrs.rcsb_struct_symmetry.symbol == "C2" q3 = attrs.rcsb_struct_symmetry.kind == "Global Symmetry" q4 = attrs.rcsb_entry_info.polymer_entity_count_DNA >= 1 ``` Attributes are available from the `rcsb_attributes` object and can be tab-completed. They can additionally be constructed from strings using the `Attr` (attribute) constructor. List of supported comparative operators: |Operator|Description | |--------|------------------------| |== |is | |!= |is not | |> |greater than | |>= |greater than or equal to| |< |less than | |<= |less than or equal to | |in |contains phrase or contains words| To use the `exists` operator, create an [AttributeQuery](quickstart.html#attribute-search) For methods to search and find details on attributes within this package, go to the [attributes page](attributes.md) For a full list of attributes, please refer to the [RCSB PDB schema](http://search.rcsb.org/rcsbsearch/v2/metadata/schema). Individual `Terminal`s are combined into `Group`s using python's bitwise operators. This is analogous to how bitwise operators act on python `set` objects. The operators are lazy and won't perform the search until the query is executed. ```python query = q1 & (q2 & q3 & q4) # AND of all queries ``` AND (`&`), OR (`|`), and terminal negation (`~`) are implemented directly by the API, but the python package also implements set difference (`-`), symmetric difference (`^`), and general negation by transforming the query. List of supported bitwise operators: |Operator|Description | |--------|------------------------| |& |AND | |\| |OR | |~ |NOT | |^ |XOR/symmetric difference| |- |set difference | Queries are executed by calling them as functions. They return an iterator of result identifiers. ```python # Call the query to execute it results = query() for rid in results: print(rid) ``` By default, the query will return "entry" results (PDB IDs). It is also possible to query other types of results (see [return-types](http://search.rcsb.org/#return-type) for options): ```python # Set return_type to "assembly" when executing results = query(return_type="assembly") for assembly_id in results: print(assembly_id) ``` ### Fluent Syntax The operator syntax is great for simple queries, but requires parentheses or temporary variables for complex nested queries. In these cases the fluent syntax may be clearer. Queries are built up by appending operations sequentially. Here is the same example using the fluent syntax ```python from rcsbsearchapi.search import TextQuery, AttributeQuery, Attr # Start with a Attr or TextQuery, then add terms results = TextQuery("heat-shock transcription factor").and_( # Add attribute node as fully-formed AttributeQuery AttributeQuery( attribute="rcsb_struct_symmetry.symbol", operator="exact_match", value="C2" ) # Add attribute node as Attr with chained operations # Setting type to "text" specifies that it's a Structure Attribute .and_(Attr( attribute="rcsb_struct_symmetry.kind", type="text" )).exact_match("Global Symmetry") # Add attribute node by name (converted to Attr) with chained operations .and_("rcsb_entry_info.polymer_entity_count_DNA").greater_or_equal(1) \ # Execute the query and return assembly ids ).exec(return_type="assembly") # Exec produces an iterator of IDs for assembly_id in results: print(assembly_id) ``` ### Grouping Sub-Queries Grouping of Structural Attribute and Chemical Attribute queries is permitted. More details on attributes that are available for attribute searches can be found on the [RCSB PDB Search API](https://search.rcsb.org/#search-attributes) page. ```python from rcsbsearchapi.search import AttributeQuery # Query for structures determined by electron microscopy q1 = AttributeQuery( attribute="exptl.method", operator="exact_match", value="electron microscopy" ) # Drugbank annotations contain phrase "tylenol" q2 = AttributeQuery( attribute="drugbank_info.brand_names", operator="contains_phrase", value="tylenol" ) # Combine queries with AND query = q1 & q2 list(query()) ``` ### Sessions The result of executing a query (either by calling it or using `exec()`) is a `Session` object. It implements `__iter__`, so it is usually treated just as an iterator of IDs. Paging is handled transparently by the session, with additional API requests made lazily as needed. The page size can be controlled with the `rows` parameter. ```python first = next(iter(query(rows=1))) ``` #### Query Editor Link `Session.rcsb_query_editor_url()` will return a link to the [Search API query editor](https://search.rcsb.org/query-editor.html) populated with the query. ```python from rcsbsearchapi import AttributeQuery query = AttributeQuery("exptl.method", operator="exact_match", value="electron microscopy") session = query() session.rcsb_query_editor_url() ``` #### Advanced Search Query Builder Link `Session.rcsb_query_builder_url()` will return a link to the [Advanced Search Query Builder](https://www.rcsb.org/search/advanced) populated with the query. ```python from rcsbsearchapi import AttributeQuery query = AttributeQuery("exptl.method", operator="exact_match", value="electron microscopy") session = query() session.rcsb_query_builder_url() ``` #### Progress Bar The `Session.iquery()` method provides a progress bar indicating the number of API requests being made. It requires the `tqdm` package be installed to track the progress of the query interactively. ```python results = query().iquery() ``` ## Search Service Types The list of supported search service types are listed in the table below. |Search service |QueryType | |----------------------------------|--------------------------| |Full-text |`TextQuery()` | |Attribute (structure or chemical) |`AttributeQuery()` | |Sequence similarity |`SequenceQuery()` | |Sequence motif |`SequenceMotifQuery()` | |Structure similarity |`StructSimilarityQuery()` | |Structure motif |`StructMotifQuery()` | |Chemical similarity |`ChemSimilarityQuery()` | Learn more about available search services on the [RCSB PDB Search API docs](https://search.rcsb.org/#search-services). ### Full-Text Search To perform a general search for structures associated with the phrase "Hemoglobin", you can create a TextQuery. This does a "full-text" search, which is a general search on text associated with PDB structures or molecular definitions. ```python from rcsbsearchapi import TextQuery # Search for structures associated with the phrase "Hemoglobin" query = TextQuery(value="Hemoglobin") # Execute the query by running it as a function results = query() # Results are returned as an iterator of result identifiers. for rid in results: print(rid) ``` ### Structure and Chemical Attribute Search You can also search for specific structure or chemical attributes using an `AttributeQuery`. ```python from rcsbsearchapi import AttributeQuery # Construct the query query = AttributeQuery( attribute="rcsb_entity_source_organism.scientific_name", operator="exact_match", # other operators include "contains_phrase" and "exists" value="Homo sapiens" ) results = list(query()) # construct a list from query results print(results) ``` As Structure Attributes and Chemical Attributes are almost all unique, the package is usually able to automatically determine the search `service` required. However, for attributes that are both Structure and Chemical Attributes (e.g., `rcsb_id`), specifying a search service is required (Structure Attribute service: `text`, Chemical Attribute service: `text_chem`). ```python # "rcsb_id" is both a Structure Attribute and Chemical Attribute # so search `service` must be specified q1 = AttributeQuery( attribute="rcsb_id", operator="exact_match", value="4HHB", service="text" # "text" specifies Structure Attribute search ) list(q1()) q2 = AttributeQuery( attribute="rcsb_id", operator="exact_match", value="HEM", service="text_chem" # "text_chem" specifies Chemical Attribute search ) list(q2()) ``` |Arguments |Required| Description |Default | |-----------|--------|---------------------------------------------|----------------------| |attribute |yes |Full attribute name | | |operator |yes |Operation for query | | |value |no |Search term(s) | | |service |no |Specify structure or chemical search service | | |negation |no |Indicates if the operator is negated |False | The `operator` can be one of a number of options, depending on the attribute type being queried. For example, `contains_phrase` or `exact_match` can be used to compare the attribute to a value, or the `exists` operator may be used to check if the attribute exists for a given structure. Refer to the [Search Attributes](https://search.rcsb.org/structure-search-attributes.html) and [Chemical Attributes](https://search.rcsb.org/chemical-search-attributes.html) documentation for a full list of attributes and applicable operators. Alternatively, you can also construct attribute queries with comparative operators (e.g., `==`, `>`, or `<`) using the `rcsb_attributes` object (which also allows for names to be tab-completed): ```python from rcsbsearchapi import rcsb_attributes as attrs # Search for structures from humans query = attrs.rcsb_entity_source_organism.scientific_name == "Homo sapiens" results = list(query()) # construct a list from query results print(results) ``` The full list of supported comparative operators: |Operator|Description | |--------|------------------------| |== |is | |!= |is not | |> |greater than | |>= |greater than or equal to| |< |less than | |<= |less than or equal to | |in |contains phrase or contains words| ### Sequence Similarity Search Below is an example from the [RCSB PDB Search API](https://search.rcsb.org/#search-example-3) page, using the sequence search function. This query finds macromolecular PDB entities that share 90% sequence identity with GTPase HRas protein from *Gallus gallus* (*Chicken*). ```python from rcsbsearchapi.search import SequenceQuery # Use SequenceQuery class and add parameters query = SequenceQuery( "MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVIDGET" + "CLLDILDTAGQEEYSAMRDQYMRTGEGFLCVFAINNTKSFEDIHQYREQI" + "KRVKDSDDVPMVLVGNKCDLPARTVETRQAQDLARSYGIPYIETSAKTRQ" + "GVEDAFYTLVREIRQHKLRKLNPPDESGPGCMNCKCVIS", evalue_cutoff=1, identity_cutoff=0.9, sequence_type="protein" ) # query("polymer_entity") produces an iterator of IDs with return type - polymer entities for polyid in query("polymer_entity"): print(polyid) ``` |Arguments |Required| Description |Default | |---------------|--------|-----------------------------------------------------|----------------------| |value |yes |Protein or nucleotide sequence | | |evalue_cutoff |no |Upper cutoff for E-value (lower is more significant) |0.1 | |identity_cutoff|no |Lower cutoff for sequence identity (0-1) |0 | |sequence_type |no |Type of biological sequence ("protein", "dna", "rna")|"protein" | ### Sequence Motif Search Below is an example from the [RCSB PDB Search API](https://search.rcsb.org/#search-example-6) page, using the sequence motif search function. This query retrives occurences of the His2/Cys2 Zinc Finger DNA-binding domain as represented by its PROSITE signature. ```python from rcsbsearchapi.search import SeqMotifQuery # Use SeqMotifQuery class and add parameters query = SeqMotifQuery( "C-x(2,4)-C-x(3)-[LIVMFYWC]-x(8)-H-x(3,5)-H.", pattern_type="prosite", sequence_type="protein" ) # query("polymer_entity") produces an iterator of IDs with return type - polymer entities for polyid in query("polymer_entity"): print(polyid) ``` |Arguments |Required| Description |Default | |---------------|--------|-----------------------------------------------------|----------------------| |value |yes |Motif to search | | |pattern_type |no |Motif syntax ("simple", "prosite", "regex") |"simple" | |sequence_type |no |Type of biological sequence ("protein", "dna", "rna")|"protein" | See [Sequence Motif Search Examples](additional_examples.html#Sequence-Motif-Search-Examples) for more use cases. ### Structure Similarity Search The PDB archive can be queried using the 3D shape of a protein structure. To perform this query, 3D protein structure data must be provided as an input or parameter, A chain ID or assembly ID must be specified, whether the input structure data should be compared to Assemblies or Polymer Entity Instance (Chains) is required, and defining the search type as either strict or relaxed is required. More information on how Structure Similarity Queries work can be found on the [RCSB PDB Structure Similarity Search](https://www.rcsb.org/docs/search-and-browse/advanced-search/structure-similarity-search) page. ```python from rcsbsearchapi.search import StructSimilarityQuery # Basic query: # Querying using entry ID and default values: # assembly ID "1", operator "strict", target search space "Assemblies" q1 = StructSimilarityQuery(entry_id="4HHB") # Same query but with parameters explicitly specified q1 = StructSimilarityQuery( structure_search_type="entry_id", entry_id="4HHB", structure_input_type="assembly_id", assembly_id="1", operator="strict_shape_match", target_search_space="assembly" ) for rid in q1("assembly"): print(rid) ```