API Documentation¶

RCSB PDB Search API

class rcsbsearchapi.Attr(attribute: str, type: Optional[str] = 'text')¶

A search attribute, e.g. “rcsb_entry_container_identifiers.entry_id”

Terminals can be constructed from Attr objects using either a functional syntax, which mirrors the API operators, or with python operators.

Rather than their normal bool return values, operators return Terminals.

Pre-instantiated attributes are available from the rcsbsearchapi.rcsb_attributes object. These are generally easier to use than constructing Attr objects by hand. A complete list of valid attributes is available in the schema.

The range dictionary requires the following keys:

“from” -> int

“to” -> int

“include_lower” -> bool

“include_upper” -> bool

__contains__(value: Union[str, List[str], rcsbsearchapi.search.Value[str], rcsbsearchapi.search.Value[List[str]]]) → rcsbsearchapi.search.Terminal¶

Maps to contains_words or contains_phrase depending on the value passed.

“value” in attr maps to attr.contains_phrase(“value”) for simple values.
[“value”] in attr maps to attr.contains_words([“value”]) for lists and tuples.

__eq__(value: Attr) → bool¶
__eq__(value: Union[str, int, float, datetime.date, Value[str], Value[int], Value[float], Value[date]]) → rcsbsearchapi.search.Terminal: Return self==value.

__ge__(value: Union[int, float, datetime.date, rcsbsearchapi.search.Value[int], rcsbsearchapi.search.Value[float], rcsbsearchapi.search.Value[datetime.date]]) → rcsbsearchapi.search.Terminal¶: Return self>=value.

__gt__(value: Union[int, float, datetime.date, rcsbsearchapi.search.Value[int], rcsbsearchapi.search.Value[float], rcsbsearchapi.search.Value[datetime.date]]) → rcsbsearchapi.search.Terminal¶: Return self>value.

__le__(value: Union[int, float, datetime.date, rcsbsearchapi.search.Value[int], rcsbsearchapi.search.Value[float], rcsbsearchapi.search.Value[datetime.date]]) → rcsbsearchapi.search.Terminal¶: Return self<=value.

__lt__(value: Union[int, float, datetime.date, rcsbsearchapi.search.Value[int], rcsbsearchapi.search.Value[float], rcsbsearchapi.search.Value[datetime.date]]) → rcsbsearchapi.search.Terminal¶: Return self<value.

__ne__(value: Attr) → bool¶
__ne__(value: Union[str, int, float, datetime.date, Value[str], Value[int], Value[float], Value[date]]) → rcsbsearchapi.search.Terminal: Return self!=value.

__weakref__¶: list of weak references to the object (if defined)

contains_phrase(value: Union[str, rcsbsearchapi.search.Value[str]]) → rcsbsearchapi.search.AttributeQuery¶: Match an exact phrase

contains_words(value: Union[str, rcsbsearchapi.search.Value[str], List[str], rcsbsearchapi.search.Value[List[str]]]) → rcsbsearchapi.search.AttributeQuery¶

Match any word within the string.

Words are split at whitespace. All results which match any word are returned, with results matching more words sorted first.

equals(value: Union[int, float, datetime.date, rcsbsearchapi.search.Value[int], rcsbsearchapi.search.Value[float], rcsbsearchapi.search.Value[datetime.date]]) → rcsbsearchapi.search.AttributeQuery¶: Attribute == value

exact_match(value: Union[str, rcsbsearchapi.search.Value[str]]) → rcsbsearchapi.search.AttributeQuery¶: Exact match with the value

exists() → rcsbsearchapi.search.AttributeQuery¶: Attribute is defined for the structure

greater(value: Union[int, float, datetime.date, rcsbsearchapi.search.Value[int], rcsbsearchapi.search.Value[float], rcsbsearchapi.search.Value[datetime.date]]) → rcsbsearchapi.search.AttributeQuery¶: Attribute > value

greater_or_equal(value: Union[int, float, datetime.date, rcsbsearchapi.search.Value[int], rcsbsearchapi.search.Value[float], rcsbsearchapi.search.Value[datetime.date]]) → rcsbsearchapi.search.AttributeQuery¶: Attribute >= value

in_(value: Union[List[str], List[int], List[float], List[datetime.date], Tuple[str, …], Tuple[int, …], Tuple[float, …], Tuple[datetime.date, …], rcsbsearchapi.search.Value[List[str]], rcsbsearchapi.search.Value[List[int]], rcsbsearchapi.search.Value[List[float]], rcsbsearchapi.search.Value[List[datetime.date]], rcsbsearchapi.search.Value[Tuple[str, …]], rcsbsearchapi.search.Value[Tuple[int, …]], rcsbsearchapi.search.Value[Tuple[float, …]], rcsbsearchapi.search.Value[Tuple[datetime.date, …]]]) → rcsbsearchapi.search.AttributeQuery¶: Attribute is contained in the list of values

less(value: Union[int, float, datetime.date, rcsbsearchapi.search.Value[int], rcsbsearchapi.search.Value[float], rcsbsearchapi.search.Value[datetime.date]]) → rcsbsearchapi.search.AttributeQuery¶: Attribute < value

less_or_equal(value: Union[int, float, datetime.date, rcsbsearchapi.search.Value[int], rcsbsearchapi.search.Value[float], rcsbsearchapi.search.Value[datetime.date]]) → rcsbsearchapi.search.AttributeQuery¶: Attribute <= value

range(value: Dict[str, Any]) → rcsbsearchapi.search.AttributeQuery¶

Attribute is within the specified half-open range

Parameters: value – lower and upper bounds [a, b)

class rcsbsearchapi.Group(operator: typing_extensions.Literal[and, or], nodes: Iterable[rcsbsearchapi.search.Query] = ())¶

AND and OR combinations of queries

__and__(other: rcsbsearchapi.search.Query) → rcsbsearchapi.search.Query¶: Intersection: a & b

__invert__()¶: Negation: ~a

__or__(other: rcsbsearchapi.search.Query) → rcsbsearchapi.search.Query¶: Union: a | b

_assign_ids(node_id=0) → Tuple[rcsbsearchapi.search.Query, int]¶

Assign node_ids sequentially for all terminal nodes

This is a helper for the Query.assign_ids() method

Parameters: node_id – Id to assign to the first leaf of this query
Returns: The modified query, with node_ids assigned node_id: The next available node_id
Return type: query

to_dict()¶: Get dictionary representing this query

class rcsbsearchapi.Query¶

Base class for all types of queries.

Queries can be combined using set operators:

q1 & q2: Intersection (AND)
q1 | q2: Union (OR)
~q1: Negation (NOT)
q1 - q2: Difference (implemented as q1 & ~q2)
q1 ^ q2: Symmetric difference (XOR, implemented as (q1 & ~q2) | (~q1 & q2))

Note that only AND, OR, and negation of terminals are directly supported by the API, so other operations may be slower.

Queries can be executed by calling them as functions (list(query())) or using the exec function.

Queries are immutable, and all modifying functions return new instances.

__and__(other: rcsbsearchapi.search.Query) → rcsbsearchapi.search.Query¶: Intersection: a & b

__call__(return_type: typing_extensions.Literal[entry, assembly, polymer_entity, non_polymer_entity, polymer_instance, mol_definition] = 'entry', rows: int = 10000, return_content_type: List[typing_extensions.Literal[experimental, computational]] = ['experimental'], results_verbosity: typing_extensions.Literal[compact, minimal, verbose] = 'compact') → rcsbsearchapi.search.Session¶: Evaluate this query and return an iterator of all result IDs

abstract __invert__() → rcsbsearchapi.search.Query¶: Negation: ~a

__or__(other: rcsbsearchapi.search.Query) → rcsbsearchapi.search.Query¶: Union: a | b

__sub__(other: rcsbsearchapi.search.Query) → rcsbsearchapi.search.Query¶: Difference: a - b

__weakref__¶: list of weak references to the object (if defined)

__xor__(other: rcsbsearchapi.search.Query) → rcsbsearchapi.search.Query¶: Symmetric difference: a ^ b

abstract _assign_ids(node_id=0) → Tuple[rcsbsearchapi.search.Query, int]¶

Assign node_ids sequentially for all terminal nodes

This is a helper for the Query.assign_ids() method

Parameters: node_id – Id to assign to the first leaf of this query
Returns: The modified query, with node_ids assigned node_id: The next available node_id
Return type: query

and_(other: Query) → Query ¶
and_(other: Union[str, Attr]) → PartialQuery: Extend this query with an additional attribute via an AND

assign_ids() → rcsbsearchapi.search.Query¶

Assign node_ids sequentially for all terminal nodes

Returns: the modified query, with node_ids assigned sequentially from 0

count(return_type: typing_extensions.Literal[entry, assembly, polymer_entity, non_polymer_entity, polymer_instance, mol_definition] = 'entry', return_content_type: List[typing_extensions.Literal[experimental, computational]] = ['experimental']) → int¶: Get the number of results found by this query

exec(return_type: typing_extensions.Literal[entry, assembly, polymer_entity, non_polymer_entity, polymer_instance, mol_definition] = 'entry', rows: int = 10000, return_content_type: List[typing_extensions.Literal[experimental, computational]] = ['experimental'], results_verbosity: typing_extensions.Literal[compact, minimal, verbose] = 'compact') → rcsbsearchapi.search.Session¶: Evaluate this query and return an iterator of all result IDs

facets(return_type: typing_extensions.Literal[entry, assembly, polymer_entity, non_polymer_entity, polymer_instance, mol_definition] = 'entry', facets: Optional[Union[rcsbsearchapi.search.Facet, rcsbsearchapi.search.FilterFacet, List[Union[rcsbsearchapi.search.Facet, rcsbsearchapi.search.FilterFacet]]]] = None) → List¶: Perform a facets query and return the buckets

or_(other: Query) → Query ¶
or_(other: Union[str, Attr]) → PartialQuery: Extend this query with an additional attribute via an OR

abstract to_dict() → Dict¶: Get dictionary representing this query

to_json() → str¶: Get JSON string of this query

class rcsbsearchapi.Session(query: rcsbsearchapi.search.Query, return_type: typing_extensions.Literal[entry, assembly, polymer_entity, non_polymer_entity, polymer_instance, mol_definition] = 'entry', rows: int = 10000, return_content_type: List[typing_extensions.Literal[experimental, computational]] = ['experimental'], results_verbosity: typing_extensions.Literal[compact, minimal, verbose] = 'compact', facets: Optional[Union[rcsbsearchapi.search.Facet, rcsbsearchapi.search.FilterFacet, List[Union[rcsbsearchapi.search.Facet, rcsbsearchapi.search.FilterFacet]]]] = None)¶

A single query session.

Handles paging the query and parsing results

__init__(query: rcsbsearchapi.search.Query, return_type: typing_extensions.Literal[entry, assembly, polymer_entity, non_polymer_entity, polymer_instance, mol_definition] = 'entry', rows: int = 10000, return_content_type: List[typing_extensions.Literal[experimental, computational]] = ['experimental'], results_verbosity: typing_extensions.Literal[compact, minimal, verbose] = 'compact', facets: Optional[Union[rcsbsearchapi.search.Facet, rcsbsearchapi.search.FilterFacet, List[Union[rcsbsearchapi.search.Facet, rcsbsearchapi.search.FilterFacet]]]] = None)¶: Initialize self. See help(type(self)) for accurate signature.

__iter__() → Iterator[str]¶: Generator for all results as a list of identifiers

__weakref__¶: list of weak references to the object (if defined)

static _extract_identifiers(query_json: Optional[Dict]) → List[str]¶: Extract identifiers from a JSON response

_make_params(start=0)¶: Generate GET parameters as a dict

_single_query(start=0) → Optional[Dict]¶: Fires a single query

iquery(limit: Optional[int] = None) → List[str]¶

Evaluate the query and display an interactive progress bar.

Requires tqdm.

static make_uuid() → str¶: Create a new UUID to identify a query

rcsb_query_builder_url() → str¶: URL to view this query on the RCSB PDB website query builder

rcsb_query_editor_url() → str¶: URL to edit this query in the RCSB PDB query editor

class rcsbsearchapi.Terminal(service: str, params: Dict[str, Any], node_id: int = 0)¶

A terminal query node.

Used for doing various types of searches. Accepts a service type and a dictionary of parameters. The set of parameters differs for different search services.

Terminal can be built by passing in a service and parameter dictionary, but it’s tedious work. Typically, it’s built by child classes that each represent a unique type of search. This allows for more concise searching.

Examples

>>> Terminal("full_text", {"value": "protease"})
>>> Terminal("text", {"attribute": "rcsb_id", "operator": "in", "negation": False, "value": ["5T89, "1TIM"]})

__invert__()¶: Negation: ~a

_assign_ids(node_id=0) → Tuple[rcsbsearchapi.search.Query, int]¶

Assign node_ids sequentially for all terminal nodes

This is a helper for the Query.assign_ids() method

Parameters: node_id – Id to assign to the first leaf of this query
Returns: The modified query, with node_ids assigned node_id: The next available node_id
Return type: query

to_dict()¶: Get dictionary representing this query

class rcsbsearchapi.TextQuery(value: str)¶

Special case of a Terminal for free-text queries

__init__(value: str)¶

Search for the string value anywhere in the text

Parameters: value – free-text query

class rcsbsearchapi.Value(value: T)¶

Represents a value in a query.

In most cases values are unnecessary and can be replaced directly by the python value.

Values can also be used if the Attr object appears on the right:

Value(“4HHB”) == Attr(“rcsb_entry_container_identifiers.entry_id”)

__eq__(attr: Value) → bool¶
__eq__(attr: rcsbsearchapi.search.Attr) → rcsbsearchapi.search.Terminal: Return self==value.

__ge__(attr: rcsbsearchapi.search.Attr) → rcsbsearchapi.search.Terminal¶: Return self>=value.

__gt__(attr: rcsbsearchapi.search.Attr) → rcsbsearchapi.search.Terminal¶: Return self>value.

__le__(attr: rcsbsearchapi.search.Attr) → rcsbsearchapi.search.Terminal¶: Return self<=value.

__lt__(attr: rcsbsearchapi.search.Attr) → rcsbsearchapi.search.Terminal¶: Return self<value.

__ne__(attr: Value) → bool¶
__ne__(attr: rcsbsearchapi.search.Attr) → rcsbsearchapi.search.Terminal: Return self!=value.

__weakref__¶: list of weak references to the object (if defined)

rcsbsearchapi.rcsb_attributes: SchemaGroup = <rcsbsearchapi.schema.SchemaGroup object>¶

Object with all known RCSB PDB attributes.

This is provided to ease autocompletion as compared to creating Attr objects from strings. For example,

rcsb_attributes.rcsb_nonpolymer_instance_feature_summary.chem_id

is equivalent to

Attr('rcsb_nonpolymer_instance_feature_summary.chem_id')

All attributes in rcsb_attributes can be iterated over.

>>> [a for a in rcsb_attributes if "stoichiometry" in a.attribute]
[Attr(attribute='rcsb_struct_symmetry.stoichiometry')]

Attributes matching a regular expression can also be filtered:

>>> list(rcsb_attributes.search('rcsb.*stoichiometry'))
[Attr(attribute='rcsb_struct_symmetry.stoichiometry')]a