Pandas Compatibility
DataStore implements 209 pandas DataFrame methods for full API compatibility. Your existing pandas code works with minimal changes.
Compatibility Approach
Key principles:
- All 209 pandas DataFrame methods implemented
- Lazy evaluation for SQL optimization
- Automatic type wrapping (DataFrame → DataStore, Series → ColumnExpr)
- Immutable operations (no
inplace=True)
Attributes and Properties
| Property | Description | Triggers Execution |
|---|---|---|
shape | (rows, columns) tuple | Yes |
columns | Column names (Index) | Yes |
dtypes | Column data types | Yes |
values | NumPy array | Yes |
index | Row index | Yes |
size | Number of elements | Yes |
ndim | Number of dimensions | No |
empty | Is DataFrame empty | Yes |
T | Transpose | Yes |
axes | List of axes | Yes |
Examples:
Indexing and Selection
| Method | Description | Example |
|---|---|---|
df['col'] | Select column | ds['age'] |
df[['col1', 'col2']] | Select columns | ds[['name', 'age']] |
df[condition] | Boolean indexing | ds[ds['age'] > 25] |
df.loc[...] | Label-based access | ds.loc[0:10, 'name'] |
df.iloc[...] | Integer-based access | ds.iloc[0:10, 0:3] |
df.at[...] | Single value by label | ds.at[0, 'name'] |
df.iat[...] | Single value by position | ds.iat[0, 0] |
df.head(n) | First n rows | ds.head(10) |
df.tail(n) | Last n rows | ds.tail(10) |
df.sample(n) | Random sample | ds.sample(100) |
df.select_dtypes() | Select by dtype | ds.select_dtypes(include='number') |
df.query() | Query expression | ds.query('age > 25') |
df.where() | Conditional replace | ds.where(ds['age'] > 0, 0) |
df.mask() | Inverse where | ds.mask(ds['age'] < 0, 0) |
df.isin() | Value membership | ds['city'].isin(['NYC', 'LA']) |
df.get() | Safe column access | ds.get('col', default=None) |
df.xs() | Cross-section | ds.xs('key') |
df.pop() | Remove column | ds.pop('col') |
Statistical Methods
| Method | Description | SQL Equivalent |
|---|---|---|
mean() | Mean value | AVG() |
median() | Median value | MEDIAN() |
mode() | Mode value | - |
std() | Standard deviation | STDDEV() |
var() | Variance | VAR() |
min() | Minimum | MIN() |
max() | Maximum | MAX() |
sum() | Sum | SUM() |
prod() | Product | - |
count() | Non-null count | COUNT() |
nunique() | Unique count | UNIQ() |
value_counts() | Value frequencies | GROUP BY |
quantile() | Quantile | QUANTILE() |
describe() | Summary statistics | - |
corr() | Correlation matrix | CORR() |
cov() | Covariance matrix | COV() |
corrwith() | Pairwise correlation | - |
rank() | Rank values | RANK() |
abs() | Absolute values | ABS() |
round() | Round values | ROUND() |
clip() | Clip values | - |
cumsum() | Cumulative sum | Window function |
cumprod() | Cumulative product | Window function |
cummin() | Cumulative min | Window function |
cummax() | Cumulative max | Window function |
diff() | Difference | Window function |
pct_change() | Percent change | Window function |
skew() | Skewness | SKEW() |
kurt() | Kurtosis | KURT() |
sem() | Standard error | - |
all() | All true | - |
any() | Any true | - |
idxmin() | Index of min | - |
idxmax() | Index of max | - |
Examples:
Data Manipulation
| Method | Description |
|---|---|
drop() | Drop rows/columns |
drop_duplicates() | Remove duplicates |
duplicated() | Mark duplicates |
dropna() | Remove missing values |
fillna() | Fill missing values |
ffill() | Forward fill |
bfill() | Backward fill |
interpolate() | Interpolate values |
replace() | Replace values |
rename() | Rename columns/index |
rename_axis() | Rename axis |
assign() | Add new columns |
astype() | Convert types |
convert_dtypes() | Infer types |
copy() | Copy DataFrame |
Examples:
Sorting and Ranking
| Method | Description |
|---|---|
sort_values() | Sort by values |
sort_index() | Sort by index |
nlargest() | N largest values |
nsmallest() | N smallest values |
Examples:
Reshaping
| Method | Description |
|---|---|
pivot() | Pivot table |
pivot_table() | Pivot with aggregation |
melt() | Unpivot |
stack() | Stack columns to index |
unstack() | Unstack index to columns |
transpose() / T | Transpose |
explode() | Explode lists to rows |
squeeze() | Reduce dimensions |
droplevel() | Drop index level |
swaplevel() | Swap index levels |
reorder_levels() | Reorder levels |
Examples:
Combining / Joining
| Method | Description |
|---|---|
merge() | SQL-style merge |
join() | Join on index |
concat() | Concatenate |
append() | Append rows |
combine() | Combine with function |
combine_first() | Combine with priority |
update() | Update values |
compare() | Show differences |
Examples:
Binary Operations
| Method | Description |
|---|---|
add() / radd() | Addition |
sub() / rsub() | Subtraction |
mul() / rmul() | Multiplication |
div() / rdiv() | Division |
truediv() / rtruediv() | True division |
floordiv() / rfloordiv() | Floor division |
mod() / rmod() | Modulo |
pow() / rpow() | Power |
dot() | Matrix multiplication |
Examples:
Comparison Operations
| Method | Description |
|---|---|
eq() | Equal |
ne() | Not equal |
lt() | Less than |
le() | Less than or equal |
gt() | Greater than |
ge() | Greater than or equal |
equals() | Test equality |
compare() | Show differences |
Function Application
| Method | Description |
|---|---|
apply() | Apply function |
applymap() | Apply element-wise |
map() | Map values |
agg() / aggregate() | Aggregate |
transform() | Transform |
pipe() | Pipe functions |
groupby() | Group by |
Examples:
Time Series
| Method | Description |
|---|---|
rolling() | Rolling window |
expanding() | Expanding window |
ewm() | Exponentially weighted |
resample() | Resample time series |
shift() | Shift values |
asfreq() | Convert frequency |
asof() | Latest value as of |
at_time() | Select at time |
between_time() | Select time range |
first() / last() | First/last periods |
to_period() | Convert to period |
to_timestamp() | Convert to timestamp |
tz_convert() | Convert timezone |
tz_localize() | Localize timezone |
Examples:
Missing Data
| Method | Description |
|---|---|
isna() / isnull() | Detect missing |
notna() / notnull() | Detect non-missing |
dropna() | Drop missing |
fillna() | Fill missing |
ffill() | Forward fill |
bfill() | Backward fill |
interpolate() | Interpolate |
replace() | Replace values |
I/O Methods
| Method | Description |
|---|---|
to_csv() | Export to CSV |
to_json() | Export to JSON |
to_excel() | Export to Excel |
to_parquet() | Export to Parquet |
to_feather() | Export to Feather |
to_sql() | Export to SQL database |
to_pickle() | Pickle |
to_html() | HTML table |
to_latex() | LaTeX table |
to_markdown() | Markdown table |
to_string() | String representation |
to_dict() | Dictionary |
to_records() | Records |
to_numpy() | NumPy array |
to_clipboard() | Clipboard |
See I/O Operations for detailed documentation.
Iteration
| Method | Description |
|---|---|
items() | Iterate (column, Series) |
iterrows() | Iterate (index, Series) |
itertuples() | Iterate as namedtuples |
Key Differences from Pandas
1. Return Types
2. Lazy Execution
3. No inplace Parameter
4. Comparing Results
See Key Differences for complete details.