API Documentation

RCSB PDB Search API

class rcsbsearchapi.Attr(attribute: str, type: Optional[Union[List[str], str]], description: Optional[Union[List[str], str]] = None)

A search attribute, e.g. “rcsb_entry_container_identifiers.entry_id”

Terminals can be constructed from Attr objects using either a functional syntax, which mirrors the API operators, or with python operators.

Previously, __bool__ was overloaded to run the exists function, but __bool__ can’t be overloaded to return non-boolean value. Method overloading bool was deleted.

Rather than their normal bool return values, operators return Terminals.

Pre-instantiated attributes are available from the rcsbsearchapi.rcsb_attributes object. These are generally easier to use than constructing Attr objects by hand. A complete list of valid attributes is available in the schema.

  • The range dictionary requires the following keys:

  • “from” -> int

  • “to” -> int

  • “include_lower” -> bool

  • “include_upper” -> bool

contains_phrase(value: Union[str, Value[str]])AttributeQuery

Match an exact phrase

contains_words(value: Union[str, Value[str], List[str], Value[List[str]]])AttributeQuery

Match any word within the string.

Words are split at whitespace. All results which match any word are returned, with results matching more words sorted first.

equals(value: Union[int, float, datetime.date, Value[int], Value[float], Value[date]])AttributeQuery

Attribute == value

exact_match(value: Union[str, Value[str]])AttributeQuery

Exact match with the value

exists()rcsbsearchapi.search.AttributeQuery

Attribute is defined for the structure

greater(value: Union[int, float, datetime.date, Value[int], Value[float], Value[date]])AttributeQuery

Attribute > value

greater_or_equal(value: Union[int, float, datetime.date, Value[int], Value[float], Value[date]])AttributeQuery

Attribute >= value

in_(value: Union[List[str], List[int], List[float], List[datetime.date], Tuple[str, ], Tuple[int, ], Tuple[float, ], Tuple[datetime.date, ], Value[List[str]], Value[List[int]], Value[List[float]], Value[List[date]], Value[Tuple[str, ]], Value[Tuple[int, ]], Value[Tuple[float, ]], Value[Tuple[date, ]]])AttributeQuery

Attribute is contained in the list of values

less(value: Union[int, float, datetime.date, Value[int], Value[float], Value[date]])AttributeQuery

Attribute < value

less_or_equal(value: Union[int, float, datetime.date, Value[int], Value[float], Value[date]])AttributeQuery

Attribute <= value

range(value: Dict[str, Any])AttributeQuery

Attribute is within the specified half-open range

Parameters

value – lower and upper bounds [a, b)

type: Optional[Union[List[str], str]]

search service type. text for structure attributes, text_chem for chemical attributes

class rcsbsearchapi.AttributeQuery(attribute: Optional[str] = None, operator: Optional[str] = None, value: Optional[Union[str, int, float, datetime.date, List[str], List[int], List[float], List[datetime.date], Tuple[str, ], Tuple[int, ], Tuple[float, ], Tuple[datetime.date, ], Dict[str, Any]]] = None, service: Optional[Union[List[str], str]] = None, negation: Optional[bool] = False)

Special case of a Terminal for Structure and Chemical Attribute Searches

AttributeQueries compares some attribute of a structure to a value.

Examples

>>> AttributeQuery("exptl.method", "exact_match", "X-RAY DIFFRACTION")
>>> AttributeQuery("rcsb_entry_container_identifiers.entry_id", operator="in", value=["4HHB", "2GS2"])

A full list of attributes is available in the schema. Operators are documented here.

The Attr class provides a more pythonic way of constructing AttributeQueries.

__init__(attribute: Optional[str] = None, operator: Optional[str] = None, value: Optional[Union[str, int, float, datetime.date, List[str], List[int], List[float], List[datetime.date], Tuple[str, ], Tuple[int, ], Tuple[float, ], Tuple[datetime.date, ], Dict[str, Any]]] = None, service: Optional[Union[List[str], str]] = None, negation: Optional[bool] = False)

Search for the string value given possible attribute or operator Also can specify service and negation

Parameters
  • attribute (Optional[str], optional) – specify attribute for search (i.e struct.title, exptl.method, rcsb_id). Defaults to None.

  • operator (Optional[str], optional) – specify operation to be done for search (i.e “contains_phrase”, “exact_match”). Defaults to None.

  • value (Optional[TValue], optional) – value to compare attribute to. Defaults to None.

  • service (Optional[str], optional) – specify what search service (i.e “text”, “text_chem”). Defaults to None.

  • negation (Optional[bool], optional) – logical not. Defaults to False.

class rcsbsearchapi.Group(operator: typing_extensions.Literal[and, or], nodes: Iterable[rcsbsearchapi.search.Query] = ())

AND and OR combinations of queries

to_dict()

Get dictionary representing this query

class rcsbsearchapi.Query

Base class for all types of queries.

Queries can be combined using set operators:

  • q1 & q2: Intersection (AND)

  • q1 | q2: Union (OR)

  • ~q1: Negation (NOT)

  • q1 - q2: Difference (implemented as q1 & ~q2)

  • q1 ^ q2: Symmetric difference (XOR, implemented as (q1 & ~q2) | (~q1 & q2))

Note that only AND, OR, and negation of terminals are directly supported by the API, so other operations may be slower.

Queries can be executed by calling them as functions (list(query())) or using the exec function.

Queries are immutable, and all modifying functions return new instances.

and_(other: Query)Query
and_(other: Union[str, Attr])PartialQuery

Extend this query with an additional attribute via an AND

assign_ids()Query

Assign node_ids sequentially for all terminal nodes

Returns

the modified query, with node_ids assigned sequentially from 0

exec(return_type: typing_extensions.Literal[entry, assembly, polymer_entity, non_polymer_entity, polymer_instance, mol_definition] = 'entry', rows: int = 10000, return_content_type: List[typing_extensions.Literal[experimental, computational]] = ['experimental'], results_verbosity: typing_extensions.Literal[compact, minimal, verbose] = 'compact', return_counts: bool = False, facets: Optional[List[Union[rcsbsearchapi.search.Facet, rcsbsearchapi.search.FilterFacet]]] = None, group_by: Optional[rcsbsearchapi.search.GroupBy] = None, group_by_return_type: Optional[typing_extensions.Literal[groups, representatives]] = None, sort: Optional[List[rcsbsearchapi.search.Sort]] = None, return_explain_metadata: bool = False, scoring_strategy: Optional[typing_extensions.Literal[combined, sequence, seqmotif, strucmotif, structure, chemical, text, text_chem, full_text]] = None)Union[Session, int]

Evaluate this query and return an iterator of all result IDs

or_(other: Query)Query
or_(other: Union[str, Attr])PartialQuery

Extend this query with an additional attribute via an OR

abstract to_dict()Dict

Get dictionary representing this query

to_json()str

Get JSON string of this query

class rcsbsearchapi.Terminal(service: Union[List, str], params: Dict[str, Any], node_id: int = 0)

A terminal query node.

Used for doing various types of searches. Accepts a service type and a dictionary of parameters. The set of parameters differs for different search services.

Terminal can be built by passing in a service and parameter dictionary, but it’s tedious work. Typically, it’s built by child classes that each represent a unique type of search. This allows for more concise searching.

Examples

>>> Terminal("full_text", {"value": "protease"})
>>> Terminal("text", {"attribute": "rcsb_id", "operator": "in", "negation": False, "value": ["5T89, "1TIM"]})
to_dict()

Get dictionary representing this query

class rcsbsearchapi.TextQuery(value: str)

Special case of a Terminal for free-text queries

__init__(value: str)

Search for the string value anywhere in the text

Parameters

value – free-text query

Interact with the [RCSB PDB Search API](https://search.rcsb.org/#search-api).

class rcsbsearchapi.search.Attr(attribute: str, type: Optional[Union[List[str], str]], description: Optional[Union[List[str], str]] = None)

A search attribute, e.g. “rcsb_entry_container_identifiers.entry_id”

Terminals can be constructed from Attr objects using either a functional syntax, which mirrors the API operators, or with python operators.

Previously, __bool__ was overloaded to run the exists function, but __bool__ can’t be overloaded to return non-boolean value. Method overloading bool was deleted.

Rather than their normal bool return values, operators return Terminals.

Pre-instantiated attributes are available from the rcsbsearchapi.rcsb_attributes object. These are generally easier to use than constructing Attr objects by hand. A complete list of valid attributes is available in the schema.

  • The range dictionary requires the following keys:

  • “from” -> int

  • “to” -> int

  • “include_lower” -> bool

  • “include_upper” -> bool

contains_phrase(value: Union[str, Value[str]])AttributeQuery

Match an exact phrase

contains_words(value: Union[str, Value[str], List[str], Value[List[str]]])AttributeQuery

Match any word within the string.

Words are split at whitespace. All results which match any word are returned, with results matching more words sorted first.

equals(value: Union[int, float, datetime.date, Value[int], Value[float], Value[date]])AttributeQuery

Attribute == value

exact_match(value: Union[str, Value[str]])AttributeQuery

Exact match with the value

exists()rcsbsearchapi.search.AttributeQuery

Attribute is defined for the structure

greater(value: Union[int, float, datetime.date, Value[int], Value[float], Value[date]])AttributeQuery

Attribute > value

greater_or_equal(value: Union[int, float, datetime.date, Value[int], Value[float], Value[date]])AttributeQuery

Attribute >= value

in_(value: Union[List[str], List[int], List[float], List[datetime.date], Tuple[str, ], Tuple[int, ], Tuple[float, ], Tuple[datetime.date, ], Value[List[str]], Value[List[int]], Value[List[float]], Value[List[date]], Value[Tuple[str, ]], Value[Tuple[int, ]], Value[Tuple[float, ]], Value[Tuple[date, ]]])AttributeQuery

Attribute is contained in the list of values

less(value: Union[int, float, datetime.date, Value[int], Value[float], Value[date]])AttributeQuery

Attribute < value

less_or_equal(value: Union[int, float, datetime.date, Value[int], Value[float], Value[date]])AttributeQuery

Attribute <= value

range(value: Dict[str, Any])AttributeQuery

Attribute is within the specified half-open range

Parameters

value – lower and upper bounds [a, b)

type: Optional[Union[List[str], str]]

search service type. text for structure attributes, text_chem for chemical attributes

class rcsbsearchapi.search.AttributeQuery(attribute: Optional[str] = None, operator: Optional[str] = None, value: Optional[Union[str, int, float, datetime.date, List[str], List[int], List[float], List[datetime.date], Tuple[str, ], Tuple[int, ], Tuple[float, ], Tuple[datetime.date, ], Dict[str, Any]]] = None, service: Optional[Union[List[str], str]] = None, negation: Optional[bool] = False)

Special case of a Terminal for Structure and Chemical Attribute Searches

AttributeQueries compares some attribute of a structure to a value.

Examples

>>> AttributeQuery("exptl.method", "exact_match", "X-RAY DIFFRACTION")
>>> AttributeQuery("rcsb_entry_container_identifiers.entry_id", operator="in", value=["4HHB", "2GS2"])

A full list of attributes is available in the schema. Operators are documented here.

The Attr class provides a more pythonic way of constructing AttributeQueries.

__init__(attribute: Optional[str] = None, operator: Optional[str] = None, value: Optional[Union[str, int, float, datetime.date, List[str], List[int], List[float], List[datetime.date], Tuple[str, ], Tuple[int, ], Tuple[float, ], Tuple[datetime.date, ], Dict[str, Any]]] = None, service: Optional[Union[List[str], str]] = None, negation: Optional[bool] = False)

Search for the string value given possible attribute or operator Also can specify service and negation

Parameters
  • attribute (Optional[str], optional) – specify attribute for search (i.e struct.title, exptl.method, rcsb_id). Defaults to None.

  • operator (Optional[str], optional) – specify operation to be done for search (i.e “contains_phrase”, “exact_match”). Defaults to None.

  • value (Optional[TValue], optional) – value to compare attribute to. Defaults to None.

  • service (Optional[str], optional) – specify what search service (i.e “text”, “text_chem”). Defaults to None.

  • negation (Optional[bool], optional) – logical not. Defaults to False.

class rcsbsearchapi.search.ChemSimilarityQuery(value: Optional[str] = None, query_type: typing_extensions.Literal[formula, descriptor] = 'formula', descriptor_type: Optional[typing_extensions.Literal[InChI, SMILES]] = None, match_subset: Optional[bool] = False, match_type: Optional[typing_extensions.Literal[graph - relaxed - stereo, graph - relaxed, fingerprint - similarity, sub - struct - graph - relaxed - stereo, sub - struct - graph - relaxed, graph - exact]] = None)

Special case of Terminal for chemical similarity search queries

__init__(value: Optional[str] = None, query_type: typing_extensions.Literal[formula, descriptor] = 'formula', descriptor_type: Optional[typing_extensions.Literal[InChI, SMILES]] = None, match_subset: Optional[bool] = False, match_type: Optional[typing_extensions.Literal[graph - relaxed - stereo, graph - relaxed, fingerprint - similarity, sub - struct - graph - relaxed - stereo, sub - struct - graph - relaxed, graph - exact]] = None)
Parameters
  • value (Optional[str], optional) – chemical formula or descriptor (SMILES or InChI). Defaults to None.

  • query_type (ChemSimType, optional) – “formula” or “descriptor”. Defaults to “formula”.

  • descriptor_type (Optional[SubsetDescriptorType], optional) – if “descriptor”, whether it’s “SMILES” or “InCHI”. Defaults to None.

  • match_subset (Optional[bool], optional) – if “formula”, return chemical components/structures that contain the formula as a subset. Defaults to False.

  • match_type (Optional[ChemSimMatchType], optional) – if “descriptor”, type of matches to find and return (see below). Defaults to None.

Guide for “match_type” options: +———————————–+——————————————-+ | match_type | | +===================================+===========================================+ | “graph-relaxed” | Similar Ligands (including Stereoisomers) | | “graph-relaxed-stereo” | Similar Ligands (Stereospecific) | | “fingerprint-similarity” | Similar Ligands (Quick screen) | | “sub-struct-graph-relaxed-stereo” | Substructure (Stereospecific) | | “sub-struct-graph-relaxed” | Substructure (including Stereoisomers) | | “graph-exact” | Exact match | +———————————–+——————————————-+

class rcsbsearchapi.search.Facet(name: str, aggregation_type: typing_extensions.Literal[terms, histogram, date_histogram, range, date_range, cardinality], attribute: str, interval: Optional[Union[int, str]] = None, ranges: Optional[List[rcsbsearchapi.search.Range]] = None, min_interval_population: Optional[int] = None, max_num_intervals: Optional[int] = None, precision_threshold: Optional[int] = None, nested_facets: Optional[Union[rcsbsearchapi.search.Facet, rcsbsearchapi.search.FilterFacet, List[Union[rcsbsearchapi.search.Facet, rcsbsearchapi.search.FilterFacet]]]] = None)

Facet object for use in a faceted query.

Attributes:

name (str): Specifies the name of the aggregation. aggregation_type (AggregationType): Specifies the type of the aggregation. Can be “terms”, “histogram”, “date_histogram”, “range”, “date_range”, or “cardinality”. attribute (str): Specifies the full attribute name to aggregate on. interval (Optional[Union[int, str]], optional): Size of the intervals into which a given set of values is divided. Required only for use with

“histogram” and “date_histogram” aggregation types (defaults to None if not included).

ranges (Optional[List[Range]], optional): A set of ranges, each representing a bucket. Note that this aggregation includes the ‘from’ value and

excludes the ‘to’ value for each range. Should be a list of Range objects (leave the “include_lower” and “include_upper” fields empty). Required only for use with “range” and “date_range” aggregation types (defaults to None if not included).

min_interval_population (Optional[int], optional): Minimum number of items (>= 0) in the bin required for the bin to be returned. Only for use with

“terms”, “histogram”, and “date_histogram” facets (defaults to 1 for these aggregation types, otherwise defaults to None).

max_num_intervals (Optional[int], optional): Maximum number of intervals (<= 65336) to return for a given facet. Only for use with “terms”

aggregation type (defaults to 65336 for this aggregation type, otherwise defaults to None).

precision_threshold (Optional[int], optional): Allows to trade memory for accuracy, and defines a unique count (<= 40000) below which counts are

expected to be close to accurate. Only for use with “cardinality” aggregation type (defaults to 40000 for this aggregation type, otherwise defaults to None).

nested_facets (Optional[Union[Facet, FilterFacet, List[Union[Facet, FilterFacet]]]], optional): Enables multi-dimensional aggregations.

Should contain a List of Facets or FilterFacets. Can be used with any aggregation type. Defaults to None.

to_dict()dict

Get dictionary representing request option, skips values of None

class rcsbsearchapi.search.FilterFacet(filter: Union[rcsbsearchapi.search.TerminalFilter, rcsbsearchapi.search.GroupFilter], facets: Union[rcsbsearchapi.search.Facet, FilterFacet, List[Union[rcsbsearchapi.search.Facet, FilterFacet]]])

Filter results that contribute to bucket count

filter

filter to apply to facets

Type

Union[TerminalFilter, GroupFilter]

facets
Type

Union[Facet, “FilterFacet”, List[Union[Facet, “FilterFacet”]]]

class rcsbsearchapi.search.Group(operator: typing_extensions.Literal[and, or], nodes: Iterable[rcsbsearchapi.search.Query] = ())

AND and OR combinations of queries

to_dict()

Get dictionary representing this query

class rcsbsearchapi.search.GroupBy(aggregation_method: str, similarity_cutoff: Optional[int] = None, ranking_criteria_type: Optional[rcsbsearchapi.search.RankingCriteriaType] = None)

return results as groups

aggregation_method

“matching_deposit_group_id”, “sequence_identity”, “matching_uniprot_accession”.

Type

str

similarity_cutoff

only for aggregation method “sequence identity”, identity threshold for grouping. 100, 95, 90,70, 50, or 30. Defaults to None.

Type

int, optional

ranking_criteria_type

control ordering of results. Defaults to None.

Type

Optional[RankingCriteriaType], optional

to_dict()Dict

Get dictionary representing request option, skips values of None

class rcsbsearchapi.search.GroupFilter(logical_operator: typing_extensions.Literal[and, or], nodes: List[Union[TerminalFilter, GroupFilter]])

Group filter class for use with FilterFacet queries

logical operator

“and”, “or” logical operator

Type

TAndOr

nodes

list of filters to combine

Type

List[Union[“TerminalFilter”, “GroupFilter”]]

to_dict()

Get dictionary representing request option, skips values of None

class rcsbsearchapi.search.PartialQuery(query: rcsbsearchapi.search.Query, operator: typing_extensions.Literal[and, or], attr: rcsbsearchapi.search.Attr)

A PartialQuery extends a growing query with an Attr. It is constructed using the fluent syntax with the and_ and or_ methods. It is not usually necessary to create instances of this class directly.

PartialQuery instances behave like Attr instances in most situations.

__init__(query: rcsbsearchapi.search.Query, operator: typing_extensions.Literal[and, or], attr: rcsbsearchapi.search.Attr)

Initialize self. See help(type(self)) for accurate signature.

class rcsbsearchapi.search.Query

Base class for all types of queries.

Queries can be combined using set operators:

  • q1 & q2: Intersection (AND)

  • q1 | q2: Union (OR)

  • ~q1: Negation (NOT)

  • q1 - q2: Difference (implemented as q1 & ~q2)

  • q1 ^ q2: Symmetric difference (XOR, implemented as (q1 & ~q2) | (~q1 & q2))

Note that only AND, OR, and negation of terminals are directly supported by the API, so other operations may be slower.

Queries can be executed by calling them as functions (list(query())) or using the exec function.

Queries are immutable, and all modifying functions return new instances.

and_(other: Query)Query
and_(other: Union[str, Attr])PartialQuery

Extend this query with an additional attribute via an AND

assign_ids()Query

Assign node_ids sequentially for all terminal nodes

Returns

the modified query, with node_ids assigned sequentially from 0

exec(return_type: typing_extensions.Literal[entry, assembly, polymer_entity, non_polymer_entity, polymer_instance, mol_definition] = 'entry', rows: int = 10000, return_content_type: List[typing_extensions.Literal[experimental, computational]] = ['experimental'], results_verbosity: typing_extensions.Literal[compact, minimal, verbose] = 'compact', return_counts: bool = False, facets: Optional[List[Union[rcsbsearchapi.search.Facet, rcsbsearchapi.search.FilterFacet]]] = None, group_by: Optional[rcsbsearchapi.search.GroupBy] = None, group_by_return_type: Optional[typing_extensions.Literal[groups, representatives]] = None, sort: Optional[List[rcsbsearchapi.search.Sort]] = None, return_explain_metadata: bool = False, scoring_strategy: Optional[typing_extensions.Literal[combined, sequence, seqmotif, strucmotif, structure, chemical, text, text_chem, full_text]] = None)Union[Session, int]

Evaluate this query and return an iterator of all result IDs

or_(other: Query)Query
or_(other: Union[str, Attr])PartialQuery

Extend this query with an additional attribute via an OR

abstract to_dict()Dict

Get dictionary representing this query

to_json()str

Get JSON string of this query

class rcsbsearchapi.search.Range(start: Optional[Union[str, float]] = None, end: Optional[Union[str, float]] = None, include_lower: Optional[bool] = None, include_upper: Optional[bool] = None)

Primarily for use with “range” and “date_range” aggregations with the Facet class. include_upper and include_lower should not be used with Facet queries.

Either start or end are required to construct .. attribute:: start

type

Optional[Union[str, float]]

end
Type

Optional[Union[str, float]]

include_lower

whether to include start value in range

Type

Optional[bool]

include_upper

whether to include end value in range

Type

Optional[bool]

class rcsbsearchapi.search.RankingCriteriaType(sort_by: str, filter: Optional[Union[rcsbsearchapi.search.GroupFilter, rcsbsearchapi.search.TerminalFilter]] = None, direction: Optional[typing_extensions.Literal[asc, desc]] = None)

Request option controlling the order that results are returned

sort_by

“score”, “size”, “count”, or full attribute name

Type

str

filter

filter out results

Type

Optional[Union[GroupFilter, TerminalFilter]], optional

direction

The order in which to sort. Undefined defaults to “desc”.

Type

Optional[Literal[“asc”, “desc”]]

class rcsbsearchapi.search.RequestOption

Base class for request options Note: return_all_hits, paginate not implemented. They are handled automatically by package.

abstract to_dict()Dict

Get dictionary representing request option, skips values of None

class rcsbsearchapi.search.SeqMotifQuery(value: str, pattern_type: Optional[typing_extensions.Literal[simple, prosite, regex]] = 'simple', sequence_type: Optional[typing_extensions.Literal[dna, rna, protein]] = 'protein')

Special case of a terminal for protein, DNA, or RNA sequence motif queries

__init__(value: str, pattern_type: Optional[typing_extensions.Literal[simple, prosite, regex]] = 'simple', sequence_type: Optional[typing_extensions.Literal[dna, rna, protein]] = 'protein')
Parameters
  • value (str) – motif to search

  • pattern_type (Optional[SeqMode], optional) – motif syntax (“simple”, “prosite”, “regex”). Defaults to “simple”.

  • sequence_type (Optional[SequenceType], optional) – type of biological sequence (“protein”, “dna”, “rna”). Defaults to “protein”.

class rcsbsearchapi.search.SequenceQuery(value: str, evalue_cutoff: Optional[float] = 0.1, identity_cutoff: Optional[float] = 0, sequence_type: Optional[typing_extensions.Literal[dna, rna, protein]] = 'protein')

Special case of a terminal for protein, DNA, or RNA sequence queries

__init__(value: str, evalue_cutoff: Optional[float] = 0.1, identity_cutoff: Optional[float] = 0, sequence_type: Optional[typing_extensions.Literal[dna, rna, protein]] = 'protein')

The string value is a target sequence that is searched

Parameters
  • value (str) – protein or nucleotide sequence

  • evalue_cutoff (Optional[float], optional) – upper cutoff for E-value (lower is more significant). Defaults to 0.1.

  • identity_cutoff (Optional[float], optional) – lower cutoff for percent sequence match (0-1). Defaults to 0.

  • sequence_type (Optional[SequenceType], optional) – type of biological sequence (“protein”, “dna”, “rna”). Defaults to “protein”.

class rcsbsearchapi.search.Session(query: rcsbsearchapi.search.Query, return_type: typing_extensions.Literal[entry, assembly, polymer_entity, non_polymer_entity, polymer_instance, mol_definition] = 'entry', rows: int = 10000, return_content_type: List[typing_extensions.Literal[experimental, computational]] = ['experimental'], results_verbosity: typing_extensions.Literal[compact, minimal, verbose] = 'compact', return_counts: bool = False, facets: Optional[List[Union[rcsbsearchapi.search.Facet, rcsbsearchapi.search.FilterFacet]]] = None, group_by: Optional[rcsbsearchapi.search.GroupBy] = None, group_by_return_type: Optional[typing_extensions.Literal[groups, representatives]] = None, sort: Optional[List[rcsbsearchapi.search.Sort]] = None, return_explain_metadata: bool = False, scoring_strategy: Optional[typing_extensions.Literal[combined, sequence, seqmotif, strucmotif, structure, chemical, text, text_chem, full_text]] = None)

A single query session.

Handles paging the query and parsing results

__init__(query: rcsbsearchapi.search.Query, return_type: typing_extensions.Literal[entry, assembly, polymer_entity, non_polymer_entity, polymer_instance, mol_definition] = 'entry', rows: int = 10000, return_content_type: List[typing_extensions.Literal[experimental, computational]] = ['experimental'], results_verbosity: typing_extensions.Literal[compact, minimal, verbose] = 'compact', return_counts: bool = False, facets: Optional[List[Union[rcsbsearchapi.search.Facet, rcsbsearchapi.search.FilterFacet]]] = None, group_by: Optional[rcsbsearchapi.search.GroupBy] = None, group_by_return_type: Optional[typing_extensions.Literal[groups, representatives]] = None, sort: Optional[List[rcsbsearchapi.search.Sort]] = None, return_explain_metadata: bool = False, scoring_strategy: Optional[typing_extensions.Literal[combined, sequence, seqmotif, strucmotif, structure, chemical, text, text_chem, full_text]] = None)

Initialize self. See help(type(self)) for accurate signature.

iquery(limit: Optional[int] = None)List[str]

Evaluate the query and display an interactive progress bar.

Requires tqdm.

static make_uuid()str

Create a new UUID to identify a query

rcsb_query_builder_url()str

URL to view this query on the RCSB PDB website query builder

rcsb_query_editor_url()str

URL to edit this query in the RCSB PDB query editor

to_dict()Dict

return full json response

class rcsbsearchapi.search.Sort(sort_by: str, direction: Optional[str] = None, filter: Optional[Union[rcsbsearchapi.search.GroupFilter, rcsbsearchapi.search.TerminalFilter]] = None)

control sorting of results

sort_by

“score” to sort by relevancy scores or full attribute name

Type

str

filter

filter for results. Defaults to None.

Type

Optional[GroupFilter, TerminalFilter], optional

direction

“asc” (ascending) or “desc” (descending). Defaults to None.

Type

str, optional

to_dict()Dict

Get dictionary representing request option, skips values of None

class rcsbsearchapi.search.StructMotifQuery(structure_search_type: typing_extensions.Literal[entry_id, file_url, file_upload] = 'entry_id', backbone_distance_tolerance: typing_extensions.Literal[0, 1, 2, 3] = 1, side_chain_distance_tolerance: typing_extensions.Literal[0, 1, 2, 3] = 1, angle_tolerance: typing_extensions.Literal[0, 1, 2, 3] = 1, entry_id: Optional[str] = None, url: Optional[str] = None, file_path: Optional[str] = None, file_extension: Optional[str] = None, residue_ids: Optional[list] = None, rmsd_cutoff: int = 2, atom_pairing_scheme: typing_extensions.Literal[ALL, BACKBONE, SIDE_CHAIN, PSEUDO_ATOMS] = 'SIDE_CHAIN', motif_pruning_strategy: typing_extensions.Literal[NONE, KRUSKAL] = 'KRUSKAL', allowed_structures: Optional[list] = None, excluded_structures: Optional[list] = None, limit: Optional[int] = None)

Special case of a terminal for structure motif queries.

If you provide an entry_id, the other optional parameters can be ignored. If you provide a file_url, you must also provide a file_extension. If you provide a filepath, you must also provide a file_extension.

As is standard with Structure Motif Queries, you must include a list of residues.

Positional arguments STRONGLY discouraged.

__init__(structure_search_type: typing_extensions.Literal[entry_id, file_url, file_upload] = 'entry_id', backbone_distance_tolerance: typing_extensions.Literal[0, 1, 2, 3] = 1, side_chain_distance_tolerance: typing_extensions.Literal[0, 1, 2, 3] = 1, angle_tolerance: typing_extensions.Literal[0, 1, 2, 3] = 1, entry_id: Optional[str] = None, url: Optional[str] = None, file_path: Optional[str] = None, file_extension: Optional[str] = None, residue_ids: Optional[list] = None, rmsd_cutoff: int = 2, atom_pairing_scheme: typing_extensions.Literal[ALL, BACKBONE, SIDE_CHAIN, PSEUDO_ATOMS] = 'SIDE_CHAIN', motif_pruning_strategy: typing_extensions.Literal[NONE, KRUSKAL] = 'KRUSKAL', allowed_structures: Optional[list] = None, excluded_structures: Optional[list] = None, limit: Optional[int] = None)
Parameters
  • structure_search_type (StructEntryType, optional) – how to find given structure (“entry_id”, “url”, “file_path”). Defaults to “entry_id”.

  • backbone_distance_tolerance (StructMotifTolerance, optional) – tolerance for distance between Cα atoms (in Å). Defaults to 1.

  • side_chain_distance_tolerance (StructMotifTolerance, optional) – tolerance for distance between Cβ atoms (in Å). Defaults to 1.

  • angle_tolerance (StructMotifTolerance, optional) – angle between CαCβ vectors (in multiples of 20 degrees). Defaults to 1.

  • entry_id (Optional[str], optional) – if “entry_id” specified, PDB ID or CSM ID . Defaults to None.

  • url (Optional[str], optional) – if “file_url” specified, url to file. Defaults to None.

  • file_path (Optional[str], optional) – if “file_path” specified, path to file. Defaults to None.

  • file_extension (Optional[str], optional) – if “file_url” specified, type of file linked to (ex: “cif”). Defaults to None.

  • residue_ids (Optional[list], optional) – list of StructureMotifResidue objects . Defaults to None.

  • rmsd_cutoff (int, optional) – upper cutoff for root-mean-square deviation (RMSD) score. Defaults to 2.

  • atom_pairing_scheme (StructMotifAtomPairing, optional) – Which atoms to consider to compute RMSD scores and transformations. Defaults to “SIDE_CHAIN”.

  • motif_pruning_strategy (StructMotifPruning, optional) – specifies how query motifs are pruned (i.e. simplified). Defaults to “KRUSKAL”.

  • allowed_structures (Optional[list], optional) – list of allowed residues specified by strings (ex: [“HIS”, “LYS”]). Defaults to None.

  • excluded_structures (Optional[list], optional) – if the list of structure identifiers is specified, the search will exclude those structures from the search space. Defaults to None.

  • limit (Optional[int], optional) – stop after accepting this many hits. Defaults to None.

class rcsbsearchapi.search.StructSimilarityQuery(structure_search_type: typing_extensions.Literal[entry_id, file_url, file_upload] = 'entry_id', entry_id: Optional[str] = None, file_url: Optional[str] = None, file_path: Optional[str] = None, structure_input_type: Optional[typing_extensions.Literal[assembly_id, chain_id]] = 'assembly_id', assembly_id: Optional[str] = '1', chain_id: Optional[str] = None, operator: typing_extensions.Literal[strict_shape_match, relaxed_shape_match] = 'strict_shape_match', target_search_space: typing_extensions.Literal[polymer_entity_instance, assembly] = 'assembly', file_format: Optional[str] = None)

Special case of a terminal for structure similarity queries

__init__(structure_search_type: typing_extensions.Literal[entry_id, file_url, file_upload] = 'entry_id', entry_id: Optional[str] = None, file_url: Optional[str] = None, file_path: Optional[str] = None, structure_input_type: Optional[typing_extensions.Literal[assembly_id, chain_id]] = 'assembly_id', assembly_id: Optional[str] = '1', chain_id: Optional[str] = None, operator: typing_extensions.Literal[strict_shape_match, relaxed_shape_match] = 'strict_shape_match', target_search_space: typing_extensions.Literal[polymer_entity_instance, assembly] = 'assembly', file_format: Optional[str] = None)
Parameters
  • structure_search_type (StructEntryType, optional) – how to find given structure (“entry_id”, “file_url”, “file_path”). Defaults to “entry_id”.

  • entry_id (Optional[str], optional) – if “entry_id” specified, PDB ID or CSM ID. Defaults to None.

  • file_url (Optional[str], optional) – if “file_url” specified, url to file . Defaults to None.

  • file_path (Optional[str], optional) – if “file_path” specified, path to file. Defaults to None.

  • structure_input_type (Optional[StructSimInputType], optional) – type of the given structure . Defaults to “assembly_id”.

  • assembly_id (Optional[str], optional) – if input_type is “assembly_id”, the assembly id number. Defaults to “1”.

  • chain_id (Optional[str], optional) – if input_type is “chain_id”, the chain id letter. Defaults to None.

  • operator (StructSimOperator, optional) – search mode (“strict_shape_match” or “relaxed_shape_match”). Defaults to “strict_shape_match”.

  • target_search_space (StructSimSearchSpace, optional) – target objects against which the query will be compared for shape similarity. Defaults to “assembly”.

  • file_format (Optional[str], optional) – if “file_url” specified, type of file linked to (ex: “cif”). Defaults to None.

class rcsbsearchapi.search.StructureMotifResidue(chain_id: Optional[str] = None, struct_oper_id: Optional[str] = None, label_seq_id: Optional[str] = None, exchanges: Optional[list] = None)

This class is for defining residues. For use with the Structure Motif Search.

__init__(chain_id: Optional[str] = None, struct_oper_id: Optional[str] = None, label_seq_id: Optional[str] = None, exchanges: Optional[list] = None)

Initialize self. See help(type(self)) for accurate signature.

class rcsbsearchapi.search.Terminal(service: Union[List, str], params: Dict[str, Any], node_id: int = 0)

A terminal query node.

Used for doing various types of searches. Accepts a service type and a dictionary of parameters. The set of parameters differs for different search services.

Terminal can be built by passing in a service and parameter dictionary, but it’s tedious work. Typically, it’s built by child classes that each represent a unique type of search. This allows for more concise searching.

Examples

>>> Terminal("full_text", {"value": "protease"})
>>> Terminal("text", {"attribute": "rcsb_id", "operator": "in", "negation": False, "value": ["5T89, "1TIM"]})
to_dict()

Get dictionary representing this query

class rcsbsearchapi.search.TerminalFilter(attribute: str, operator: typing_extensions.Literal[equals, greater, greater_or_equal, less, less_or_equal, range, exact_match, in, exists], value: Optional[Union[str, int, float, bool, rcsbsearchapi.search.Range, List[str], List[int], List[float]]] = None, negation: bool = False, case_sensitive: bool = False)

A filter based on a single Terminal node. Can be combined into GroupFilters

Attribute:

attribute (str): specify attribute for search (i.e struct.title, exptl.method, rcsb_id). Defaults to None. operator (Literal[“equals”, “greater”, “greater_or_equal”, “less”, “less_or_equal”, “range”, “exact_match”, “in”, “exists”]):

specify operation to be done for search (i.e “contains_phrase”, “exact_match”). Defaults to None.

value (Optional[Union[str, int, float, bool, Range, List[str], List[int], List[float]]], optional):

The search term(s). Can be a single or multiple words, numbers, dates, date math expressions, or ranges.

negation (bool, optional): logical not. Defaults to False. case_sensitive (bool, optional): whether to do case sensitive matching of value. Defaults to False.

to_dict()

Get dictionary representing request option, skips values of None

class rcsbsearchapi.search.TextQuery(value: str)

Special case of a Terminal for free-text queries

__init__(value: str)

Search for the string value anywhere in the text

Parameters

value – free-text query

class rcsbsearchapi.search.Value(value: T)

Represents a value in a query.

In most cases values are unnecessary and can be replaced directly by the python value.

Values can also be used if the Attr object appears on the right:

Value(“4HHB”) == Attr(“rcsb_entry_container_identifiers.entry_id”)

rcsbsearchapi.search.fileUpload(filepath: str, fmt: str = 'cif')str

Take a file given by a filepath, and return the corresponding URL to use in a structure search. This URL should then be passed through as part of the value parameter, along with the format of the file.