Skip to main content
Skip to main content

DataStore Accessors

DataStore provides 7 accessor namespaces with 185+ methods for domain-specific operations.

AccessorMethodsDescription
.str56String operations
.dt42+DateTime operations
.arr37Array operations (ClickHouse-specific)
.json13JSON parsing (ClickHouse-specific)
.url15URL parsing (ClickHouse-specific)
.ip9IP address operations (ClickHouse-specific)
.geo14Geo/distance operations (ClickHouse-specific)

String Accessor (.str)

All 56 pandas .str methods are supported, plus ClickHouse string functions.

Case Conversion

MethodClickHouseDescription
upper()upper()Convert to uppercase
lower()lower()Convert to lowercase
capitalize()initcap()Capitalize first letter
title()initcap()Title case
swapcase()-Swap case
casefold()lower()Case folding
ds['name_upper'] = ds['name'].str.upper()
ds['name_title'] = ds['name'].str.title()

Length and Size

MethodClickHouseDescription
len()length()String length (bytes)
char_length()char_length()Length in characters
ds['name_len'] = ds['name'].str.len()

Substring and Slicing

MethodClickHouseDescription
slice(start, stop)substring()Extract substring
slice_replace()-Replace slice
left(n)left()Leftmost n characters
right(n)right()Rightmost n characters
get(i)-Character at index
ds['first_3'] = ds['name'].str.slice(0, 3)
ds['last_4'] = ds['name'].str.right(4)

Trimming

MethodClickHouseDescription
strip()trim()Remove whitespace
lstrip()trimLeft()Remove leading whitespace
rstrip()trimRight()Remove trailing whitespace
ds['trimmed'] = ds['text'].str.strip()
MethodClickHouseDescription
contains(pat)position()Contains substring
startswith(pat)startsWith()Starts with prefix
endswith(pat)endsWith()Ends with suffix
find(sub)position()Find position
rfind(sub)-Find from right
index(sub)position()Find or raise
rindex(sub)-Find from right or raise
match(pat)match()Regex match
fullmatch(pat)-Full regex match
count(pat)-Count occurrences
# Contains substring
ds['has_john'] = ds['name'].str.contains('John')

# Regex match
ds['valid_email'] = ds['email'].str.match(r'^[\w.-]+@[\w.-]+\.\w+$')

Replace

MethodClickHouseDescription
replace(pat, repl)replace()Replace occurrences
replace(pat, repl, regex=True)replaceRegexpAll()Regex replace
removeprefix(prefix)-Remove prefix
removesuffix(suffix)-Remove suffix
translate(table)-Translate characters
ds['cleaned'] = ds['text'].str.replace('\n', ' ')
ds['digits_only'] = ds['phone'].str.replace(r'\D', '', regex=True)

Splitting

MethodClickHouseDescription
split(sep)splitByString()Split into array
rsplit(sep)-Split from right
partition(sep)-Split into 3 parts
rpartition(sep)-Split from right into 3
ds['parts'] = ds['path'].str.split('/')

Padding

MethodClickHouseDescription
pad(width)leftPad()Left pad
ljust(width)rightPad()Right justify
rjust(width)leftPad()Left justify
center(width)-Center
zfill(width)leftPad(..., '0')Zero fill
ds['padded_id'] = ds['id'].astype(str).str.zfill(6)

Character Tests

MethodDescription
isalpha()All alphabetic
isdigit()All digits
isalnum()Alphanumeric
isspace()All whitespace
isupper()All uppercase
islower()All lowercase
istitle()Title case
isnumeric()Numeric characters
isdecimal()Decimal characters
ds['is_numeric'] = ds['code'].str.isdigit()

Other

MethodDescription
repeat(n)Repeat n times
reverse()Reverse string
wrap(width)Wrap text
encode(enc)Encode
decode(enc)Decode
normalize(form)Unicode normalize
extract(pat)Extract regex groups
extractall(pat)Extract all matches
cat(sep)Concatenate all
get_dummies(sep)Dummy variables

DateTime Accessor (.dt)

All 42+ pandas .dt methods plus ClickHouse datetime functions.

Date Components

PropertyClickHouseDescription
yeartoYear()Year
monthtoMonth()Month (1-12)
daytoDayOfMonth()Day (1-31)
hourtoHour()Hour (0-23)
minutetoMinute()Minute (0-59)
secondtoSecond()Second (0-59)
millisecondtoMillisecond()Millisecond
microsecondtoMicrosecond()Microsecond
quartertoQuarter()Quarter (1-4)
dayofweektoDayOfWeek()Day of week (0=Mon)
dayofyeartoDayOfYear()Day of year
weektoWeek()Week number
days_in_month-Days in month
ds['year'] = ds['date'].dt.year
ds['month'] = ds['date'].dt.month
ds['day_of_week'] = ds['date'].dt.dayofweek

Truncation

MethodClickHouseDescription
to_start_of_day()toStartOfDay()Start of day
to_start_of_week()toStartOfWeek()Start of week
to_start_of_month()toStartOfMonth()Start of month
to_start_of_quarter()toStartOfQuarter()Start of quarter
to_start_of_year()toStartOfYear()Start of year
to_start_of_hour()toStartOfHour()Start of hour
to_start_of_minute()toStartOfMinute()Start of minute
ds['month_start'] = ds['date'].dt.to_start_of_month()

Arithmetic

MethodClickHouseDescription
add_years(n)addYears()Add years
add_months(n)addMonths()Add months
add_weeks(n)addWeeks()Add weeks
add_days(n)addDays()Add days
add_hours(n)addHours()Add hours
add_minutes(n)addMinutes()Add minutes
add_seconds(n)addSeconds()Add seconds
subtract_years(n)subtractYears()Subtract years
subtract_months(n)subtractMonths()Subtract months
subtract_days(n)subtractDays()Subtract days
ds['next_month'] = ds['date'].dt.add_months(1)
ds['last_week'] = ds['date'].dt.subtract_weeks(1)

Boolean Checks

MethodDescription
is_month_start()First day of month
is_month_end()Last day of month
is_quarter_start()First day of quarter
is_quarter_end()Last day of quarter
is_year_start()First day of year
is_year_end()Last day of year
is_leap_year()Leap year
ds['is_eom'] = ds['date'].dt.is_month_end()

Formatting

MethodClickHouseDescription
strftime(fmt)formatDateTime()Format as string
day_name()-Day name
month_name()-Month name
ds['date_str'] = ds['date'].dt.strftime('%Y-%m-%d')
ds['day_name'] = ds['date'].dt.day_name()

Timezone

MethodClickHouseDescription
tz_convert(tz)toTimezone()Convert timezone
tz_localize(tz)-Localize timezone
ds['utc_time'] = ds['timestamp'].dt.tz_convert('UTC')

Array Accessor (.arr)

ClickHouse-specific array operations (37 methods).

Properties

PropertyClickHouseDescription
lengthlength()Array length
sizelength()Alias for length
emptyempty()Is empty
not_emptynotEmpty()Is not empty
ds['tag_count'] = ds['tags'].arr.length
ds['has_tags'] = ds['tags'].arr.not_empty

Element Access

MethodClickHouseDescription
array_first()arrayElement(..., 1)First element
array_last()arrayElement(..., -1)Last element
array_element(n)arrayElement()Nth element
array_slice(off, len)arraySlice()Slice array
ds['first_tag'] = ds['tags'].arr.array_first()
ds['last_tag'] = ds['tags'].arr.array_last()

Aggregations

MethodClickHouseDescription
array_sum()arraySum()Sum of elements
array_avg()arrayAvg()Average
array_min()arrayMin()Minimum
array_max()arrayMax()Maximum
array_product()arrayProduct()Product
array_uniq()arrayUniq()Count unique
ds['total'] = ds['values'].arr.array_sum()
ds['average'] = ds['values'].arr.array_avg()

Transformations

MethodClickHouseDescription
array_sort()arraySort()Sort ascending
array_reverse_sort()arrayReverseSort()Sort descending
array_reverse()arrayReverse()Reverse order
array_distinct()arrayDistinct()Unique elements
array_compact()arrayCompact()Remove consecutive dupes
array_flatten()arrayFlatten()Flatten nested
ds['sorted_tags'] = ds['tags'].arr.array_sort()
ds['unique_tags'] = ds['tags'].arr.array_distinct()

Modifications

MethodClickHouseDescription
array_push_back(elem)arrayPushBack()Add to end
array_push_front(elem)arrayPushFront()Add to front
array_pop_back()arrayPopBack()Remove last
array_pop_front()arrayPopFront()Remove first
array_concat(other)arrayConcat()Concatenate
MethodClickHouseDescription
has(elem)has()Contains element
index_of(elem)indexOf()Find index
count_equal(elem)countEqual()Count occurrences
ds['has_python'] = ds['skills'].arr.has('Python')

String Operations

MethodClickHouseDescription
array_string_concat(sep)arrayStringConcat()Join to string
ds['tags_str'] = ds['tags'].arr.array_string_concat(', ')

JSON Accessor (.json)

ClickHouse-specific JSON parsing (13 methods).

MethodClickHouseDescription
get_string(path)JSONExtractString()Extract string
get_int(path)JSONExtractInt()Extract integer
get_float(path)JSONExtractFloat()Extract float
get_bool(path)JSONExtractBool()Extract boolean
get_raw(path)JSONExtractRaw()Extract raw JSON
get_keys()JSONExtractKeys()Get keys
get_type(path)JSONType()Get type
get_length(path)JSONLength()Get length
has_key(key)JSONHas()Check key exists
is_valid()isValidJSON()Validate JSON
to_json_string()toJSONString()Convert to JSON
# Parse JSON columns
ds['user_name'] = ds['json_data'].json.get_string('user.name')
ds['user_age'] = ds['json_data'].json.get_int('user.age')
ds['is_active'] = ds['json_data'].json.get_bool('user.active')
ds['has_email'] = ds['json_data'].json.has_key('user.email')

URL Accessor (.url)

ClickHouse-specific URL parsing (15 methods).

MethodClickHouseDescription
domain()domain()Extract domain
domain_without_www()domainWithoutWWW()Domain without www
top_level_domain()topLevelDomain()TLD
protocol()protocol()Protocol (http/https)
path()path()URL path
path_full()pathFull()Path with query
query_string()queryString()Query string
fragment()fragment()Fragment (#...)
port()port()Port number
extract_url_parameter(name)extractURLParameter()Get query param
extract_url_parameters()extractURLParameters()All params
cut_url_parameter(name)cutURLParameter()Remove param
decode_url_component()decodeURLComponent()URL decode
encode_url_component()encodeURLComponent()URL encode
# Parse URLs
ds['domain'] = ds['url'].url.domain()
ds['path'] = ds['url'].url.path()
ds['utm_source'] = ds['url'].url.extract_url_parameter('utm_source')

IP Accessor (.ip)

ClickHouse-specific IP address operations (9 methods).

MethodClickHouseDescription
to_ipv4()toIPv4()Convert to IPv4
to_ipv6()toIPv6()Convert to IPv6
ipv4_num_to_string()IPv4NumToString()Number to string
ipv4_string_to_num()IPv4StringToNum()String to number
ipv6_num_to_string()IPv6NumToString()IPv6 num to string
ipv4_to_ipv6()IPv4ToIPv6()Convert to IPv6
is_ipv4_string()isIPv4String()Validate IPv4
is_ipv6_string()isIPv6String()Validate IPv6
ipv4_cidr_to_range(cidr)IPv4CIDRToRange()CIDR to range
# IP operations
ds['is_valid_ip'] = ds['ip'].ip.is_ipv4_string()
ds['ip_num'] = ds['ip'].ip.ipv4_string_to_num()

Geo Accessor (.geo)

ClickHouse-specific geo/distance operations (14 methods).

Distance Functions

MethodClickHouseDescription
great_circle_distance(...)greatCircleDistance()Great circle distance
geo_distance(...)geoDistance()WGS-84 distance
l1_distance(v1, v2)L1Distance()Manhattan distance
l2_distance(v1, v2)L2Distance()Euclidean distance
l2_squared_distance(v1, v2)L2SquaredDistance()Squared Euclidean
linf_distance(v1, v2)LinfDistance()Chebyshev distance
cosine_distance(v1, v2)cosineDistance()Cosine distance

Vector Operations

MethodClickHouseDescription
dot_product(v1, v2)dotProduct()Dot product
l2_norm(vec)L2Norm()Vector norm
l2_normalize(vec)L2Normalize()Normalize

H3 Functions

MethodClickHouseDescription
geo_to_h3(lon, lat, res)geoToH3()Geo to H3 index
h3_to_geo(h3)h3ToGeo()H3 to geo coords

Point Operations

MethodClickHouseDescription
point_in_polygon(pt, poly)pointInPolygon()Point in polygon
point_in_ellipses(...)pointInEllipses()Point in ellipses
from chdb.datastore import F

# Calculate distances
ds['distance'] = F.great_circle_distance(
    ds['lon1'], ds['lat1'],
    ds['lon2'], ds['lat2']
)

# Vector similarity
ds['similarity'] = F.cosine_distance(ds['embedding1'], ds['embedding2'])

Using Accessors

Lazy Evaluation

Most accessor methods are lazy - they return expressions that are evaluated later:

# All these are lazy
ds['name_upper'] = ds['name'].str.upper()  # Not executed yet
ds['year'] = ds['date'].dt.year            # Not executed yet
ds['domain'] = ds['url'].url.domain()      # Not executed yet

# Execution happens when you access results
df = ds.to_df()  # Now everything executes

Methods That Execute Immediately

Some .str methods must execute because they change the structure:

MethodReturnsWhy
partition(sep)DataStore (3 columns)Creates multiple columns
rpartition(sep)DataStore (3 columns)Creates multiple columns
get_dummies(sep)DataStore (N columns)Dynamic column count
extractall(pat)DataStoreMultiIndex result
cat(sep)strAggregation (N rows → 1)

Chaining Accessors

Accessor methods can be chained:

ds['clean_name'] = (ds['name']
    .str.strip()
    .str.lower()
    .str.replace(' ', '_')
)

ds['next_month_start'] = (ds['date']
    .dt.add_months(1)
    .dt.to_start_of_month()
)