Table of Contents
- DataFrame
- toDSV
- toCSV
- toTSV
- toPSV
- toText
- toJSON
- toDict
- toArray
- toCollection
- show
- dim
- transpose
- count
- countValue
- push
- replace
- distinct
- unique
- listColumns
- select
- withColumn
- restructure
- renameAll
- rename
- castAll
- cast
- drop
- chain
- filter
- where
- find
- map
- reduce
- reduceRight
- dropDuplicates
- dropMissingValues
- fillMissingValues
- shuffle
- sample
- bisect
- groupBy
- sortBy
- union
- join
- innerJoin
- fullJoin
- outerJoin
- leftJoin
- rightJoin
- diff
- head
- tail
- slice
- getRow
- setRow
- setDefaultModules
- fromDSV
- fromText
- fromCSV
- fromTSV
- fromPSV
- fromJSON
DataFrame
DataFrame data structure providing an immutable, flexible and powerfull way to manipulate data with columns and rows.
Parameters
data
(Array | Object | DataFrame) The data of the DataFrame.columns
Array The DataFrame column names.options
Object Additional options. Example: modules. (optional, default{}
)
toDSV
Convert the DataFrame into a text delimiter separated values. You can also save the file if you are using nodejs.
Parameters
args
...anysep
String Column separator. (optional, default' '
)header
Boolean Writing the header in the first line. If false, there will be no header. (optional, defaulttrue
)path
String? The path to save the file. /!\ Works only on node.js, not into the browser.
Examples
df.toDSV()
df.toDSV(';')
df.toDSV(';', true)
// From node.js only
df.toDSV(';', true, '/my/absolute/path/dataframe.txt')
Returns String The text file in raw string.
toCSV
Convert the DataFrame into a comma separated values string. You can also save the file if you are using nodejs.
Parameters
args
...anyheader
Boolean Writing the header in the first line. If false, there will be no header. (optional, defaulttrue
)path
String? The path to save the file. /!\ Works only on node.js, not into the browser.
Examples
df.toCSV()
df.toCSV(true)
// From node.js only
df.toCSV(true, '/my/absolute/path/dataframe.csv')
Returns String The csv file in raw string.
toTSV
Convert the DataFrame into a tab separated values string. You can also save the file if you are using nodejs.
Parameters
args
...anyheader
Boolean Writing the header in the first line. If false, there will be no header. (optional, defaulttrue
)path
String? The path to save the file. /!\ Works only on node.js, not into the browser.
Examples
df.toCSV()
df.toCSV(true)
// From node.js only
df.toCSV(true, '/my/absolute/path/dataframe.csv')
Returns String The csv file in raw string.
toPSV
Convert the DataFrame into a pipe separated values string. You can also save the file if you are using nodejs.
Parameters
args
...anyheader
Boolean Writing the header in the first line. If false, there will be no header. (optional, defaulttrue
)path
String? The path to save the file. /!\ Works only on node.js, not into the browser.
Examples
df.toPSV()
df.toPSV(true)
// From node.js only
df.toPSV(true, '/my/absolute/path/dataframe.csv')
Returns String The csv file in raw string.
toText
Convert the DataFrame into a text delimiter separated values. Alias for .toDSV. You can also save the file if you are using nodejs.
Parameters
args
...anysep
String Column separator. (optional, default' '
)header
Boolean Writing the header in the first line. If false, there will be no header. (optional, defaulttrue
)path
String? The path to save the file. /!\ Works only on node.js, not into the browser.
Examples
df.toText()
df.toText(';')
df.toText(';', true)
// From node.js only
df.toText(';', true, '/my/absolute/path/dataframe.txt')
Returns String The text file in raw string.
toJSON
Convert the DataFrame into a json string. You can also save the file if you are using nodejs.
Parameters
args
...anyasCollection
Boolean Writing the JSON as collection of Object. (optional, defaultfalse
)path
String? The path to save the file. /!\ Works only on node.js, not into the browser.
Examples
df.toJSON()
// From node.js only
df.toJSON('/my/absolute/path/dataframe.json')
Returns String The json file in raw string.
toDict
Convert DataFrame into dict / hash / object.
Examples
df.toDict()
Returns Object The DataFrame converted into dict.
toArray
Convert DataFrame into Array of Arrays. You can also extract only one column as Array.
Parameters
columnName
String? Column Name to extract. By default, all columns are transformed.
Examples
df.toArray()
Returns Array The DataFrame (or the column) converted into Array.
toCollection
Convert DataFrame into Array of dictionnaries. You can also return Rows instead of dictionnaries.
Parameters
ofRows
Boolean? Return a collection of Rows instead of dictionnaries.
Examples
df.toCollection()
Returns Array The DataFrame converted into Array of dictionnaries (or Rows).
show
Display the DataFrame as String Table. Can only return a sring instead of displaying the DataFrame.
Parameters
rows
Number The number of lines to display. (optional, default10
)quiet
Boolean Quiet mode. If true, only returns a string instead of console.log(). (optional, defaultfalse
)
Examples
df.show()
df.show(10)
const stringDF = df.show(10, true)
Returns String The DataFrame as String Table.
dim
Get the DataFrame dimensions.
Examples
const [height, weight] = df.dim()
Returns Array The DataFrame dimensions. [height, weight]
transpose
Transpose a DataFrame. Rows become columns and conversely. n x p => p x n.
Parameters
tranposeColumnNames
transposeColumnNames
Boolean An option to transpose columnNames in a rowNames column. (optional, defaultfalse
)
Examples
df.transpose()
Returns ÐataFrame A new transposed DataFrame.
count
Get the rows number.
Examples
df.count()
Returns Int The number of DataFrame rows.
countValue
Get the count of a value into a column.
Parameters
valueToCount
The value to count into the selected column.columnName
String The column to count the value. (optional, defaultthis.listColumns()[0]
)
Examples
df.countValue(5, 'column2')
df.select('column1').countValue(5)
Returns Int The number of times the selected value appears.
push
Push new rows into the DataFrame.
Parameters
rows
(Array | Row) The rows to add.
Examples
df.push([1,2,3], [1,4,9])
Returns DataFrame A new DataFrame with the new rows.
replace
Replace a value by another in all the DataFrame or in a column.
Parameters
value
The value to replace.replacement
The new value.columnNames
(String | Array) The columns to apply the replacement. (optional, defaultthis.listColumns()
)
Examples
df.replace(undefined, 0, 'column1', 'column2')
Returns DataFrame A new DataFrame with replaced values.
distinct
Compute unique values into a column.
Parameters
columnName
String The column to distinct.
Examples
df.distinct('column1')
Returns DataFrame A DataFrame containing the column with distinct values.
unique
Compute unique values into a column. Alias from .distinct()
Parameters
columnName
String The column to distinct.
Examples
df.unique('column1')
Returns DataFrame A DataFrame containing the column with distinct values.
listColumns
List DataFrame columns.
Examples
df.listColumns()
Returns Array An Array containing DataFrame columnNames.
select
Select columns in the DataFrame.
Parameters
columnNames
...String The columns to select.
Examples
df.select('column1', 'column3')
Returns DataFrame A new DataFrame containing selected columns.
withColumn
Add a new column or set an existing one.
Parameters
columnName
String The column to modify or to create.func
Function The function to create the column. (optional, default(row,index)=>undefined
)
Examples
df.withColumn('column4', () => 2)
df.withColumn('column2', (row) => row.get('column2') * 2)
Returns DataFrame A new DataFrame containing the new or modified column.
restructure
Modify the structure of the DataFrame by changing columns order, creating new columns or removing some columns.
Parameters
newColumnNames
Array The new columns of the DataFrame.
Examples
df.restructure(['column1', 'column4', 'column2', 'column3'])
df.restructure(['column1', 'column4'])
df.restructure(['column1', 'newColumn', 'column4'])
Returns DataFrame A new DataFrame with restructured columns (renamed, add or deleted).
renameAll
Rename each column.
Parameters
newColumnNames
Array The new column names of the DataFrame.
Examples
df.renameAll(['column1', 'column3', 'column4'])
Returns DataFrame A new DataFrame with the new column names.
rename
Rename a column.
Parameters
Examples
df.rename('column1', 'columnRenamed')
Returns DataFrame A new DataFrame with the new column name.
castAll
Cast each column into a given type.
Parameters
typeFunctions
Array The functions used to cast columns.
Examples
df.castAll([Number, String, (val) => new CustomClass(val)])
Returns DataFrame A new DataFrame with the columns having new types.
cast
Cast a column into a given type.
Parameters
columnName
String The column to cast.typeFunction
ObjectType
Function The function used to cast the column.
Examples
df.cast('column1', Number)
df.cast('column1', (val) => new MyCustomClass(val))
Returns DataFrame A new DataFrame with the column having a new type.
drop
Remove a single column.
Parameters
columnName
String The column to drop.
Examples
df.drop('column2')
Returns DataFrame A new DataFrame without the dropped column.
chain
Chain maps and filters functions on DataFrame by optimizing their executions. If a function returns boolean, it's a filter. Else it's a map. It can be 10 - 100 x faster than standard chains of .map() and .filter().
Parameters
funcs
...Function Functions to apply on the DataFrame rows taking the row as parameter.
Examples
df.chain(
row => row.get('column1') > 3, // filter
row => row.set('column1', 3), // map
row => row.get('column2') === '5' // filter
)
Returns DataFrame A new DataFrame with modified rows.
filter
Filter DataFrame rows.
Parameters
Examples
df.filter(row => row.get('column1') >= 3)
df.filter({'column2': 5, 'column1': 3}))
Returns DataFrame A new filtered DataFrame.
where
Filter DataFrame rows. Alias of .filter()
Parameters
Examples
df.where(row => row.get('column1') >= 3)
df.where({'column2': 5, 'column1': 3}))
Returns DataFrame A new filtered DataFrame.
find
Find a row (the first met) based on a condition.
Parameters
Examples
df.find(row => row.get('column1') === 3)
df.find({'column1': 3})
Returns Row The targeted Row.
map
Map on DataFrame rows. /!\ Prefer to use .chain().
Parameters
func
Function A function to apply on each row taking the row as parameter.
Examples
df.map(row => row.set('column1', row.get('column1') * 2))
Returns DataFrame A new DataFrame with modified rows.
reduce
Reduce DataFrame into a value.
Parameters
func
Function The reduce function taking 2 parameters, previous and next.init
The initial value of the reducer.
Examples
df.reduce((p, n) => n.get('column1') + p, 0)
df2.reduce((p, n) => (
n.set('column1', p.get('column1') + n.get('column1'))
.set('column2', p.get('column2') + n.get('column2'))
))
Returns any A reduced value.
reduceRight
Reduce DataFrame into a value, starting from the last row (see .reduce()).
Parameters
func
Function The reduce function taking 2 parameters, previous and next.init
The initial value of the reducer.
Examples
df.reduceRight((p, n) => p > n ? p : n, 0)
Returns any A reduced value.
dropDuplicates
Return a DataFrame without duplicated columns.
Parameters
columnNames
...String The columns used to check unicity of rows. If omitted, unicity is checked on all columns.
Examples
df.dropDuplicates('id', 'name')
Returns DataFrame A DataFrame without duplicated rows.
dropMissingValues
Return a DataFrame without rows containing missing values (undefined, NaN, null).
Parameters
columnNames
Array The columns to consider. All columns are considered by default.
Examples
df.dropMissingValues(['id', 'name'])
Returns DataFrame A DataFrame without rows containing missing values.
fillMissingValues
Return a DataFrame with missing values (undefined, NaN, null) fill with default value.
Parameters
replacement
The new value.columnNames
Array The columns to consider. All columns are considered by default.
Examples
df.fillMissingValues(0, ['id', 'name'])
Returns DataFrame A DataFrame with missing values replaced.
shuffle
Return a shuffled DataFrame rows.
Examples
df.shuffle()
Returns DataFrame A shuffled DataFrame.
sample
Return a random sample of rows.
Parameters
percentage
Number A percentage of the orignal DataFrame giving the sample size.
Examples
df.sample(0.3)
Returns DataFrame A sample DataFrame
bisect
Randomly split a DataFrame into 2 DataFrames.
Parameters
percentage
Number A percentage of the orignal DataFrame giving the first DataFrame size. The second takes the rest.
Examples
const [30DF, 70DF] = df.bisect(0.3)
Returns Array An Array containing the two DataFrames. First, the X% DataFrame then the rest DataFrame.
groupBy
Group DataFrame rows by columns giving a GroupedDataFrame object. See its doc for more examples.
Parameters
args
...anycolumnNames
...String The columns used for the groupBy.
Examples
df.groupBy('column1')
df.groupBy('column1', 'column2')
df.groupBy('column1', 'column2').listGroups()
df.groupBy('column1', 'column2').show()
df.groupBy('column1', 'column2').aggregate((group) => group.count())
Returns GroupedDataFrame A GroupedDataFrame object.
sortBy
Sort DataFrame rows based on column values. The row should contains only one variable type. Columns are sorted left-to-right.
Parameters
columnNames
(String | Array<string>) The columns giving order.reverse
Boolean Reverse mode. Reverse the order if true. (optional, defaultfalse
)missingValuesPosition
String Define the position of missing values (undefined, nulls and NaN) in the order. (optional, default'first'
)
Examples
df.sortBy('id')
df.sortBy(['id1', 'id2'])
df.sortBy(['id1'], true)
Returns DataFrame An ordered DataFrame.
union
Concat two DataFrames.
Parameters
dfToUnion
DataFrame The DataFrame to concat.
Examples
df.union(df2)
Returns DataFrame A new concatenated DataFrame resulting of the union.
join
Join two DataFrames.
Parameters
dfToJoin
DataFrame The DataFrame to join.columnNames
(String | Array) The selected columns for the join.how
String The join mode. Can be: full, inner, outer, left, right. (optional, default'inner'
)
Examples
df.join(df2, 'column1', 'full')
Returns DataFrame The joined DataFrame.
innerJoin
Join two DataFrames with inner mode.
Parameters
dfToJoin
DataFrame The DataFrame to join.columnNames
(String | Array) The selected columns for the join.
Examples
df.innerJoin(df2, 'id')
df.join(df2, 'id')
df.join(df2, 'id', 'inner')
Returns DataFrame The joined DataFrame.
fullJoin
Join two DataFrames with full mode.
Parameters
dfToJoin
DataFrame The DataFrame to join.columnNames
(String | Array) The selected columns for the join.
Examples
df.fullJoin(df2, 'id')
df.join(df2, 'id', 'full')
Returns DataFrame The joined DataFrame.
outerJoin
Join two DataFrames with outer mode.
Parameters
dfToJoin
DataFrame The DataFrame to join.columnNames
(String | Array) The selected columns for the join.
Examples
df2.outerJoin(df2, 'id')
df2.join(df2, 'id', 'outer')
Returns DataFrame The joined DataFrame.
leftJoin
Join two DataFrames with left mode.
Parameters
dfToJoin
DataFrame The DataFrame to join.columnNames
(String | Array) The selected columns for the join.
Examples
df.leftJoin(df2, 'id')
df.join(df2, 'id', 'left')
Returns DataFrame The joined DataFrame.
rightJoin
Join two DataFrames with right mode.
Parameters
dfToJoin
DataFrame The DataFrame to join.columnNames
(String | Array) The selected columns for the join.
Examples
df.rightJoin(df2, 'id')
df.join(df2, 'id', 'right')
Returns DataFrame The joined DataFrame.
diff
Find the differences between two DataFrames (reverse of join).
Parameters
dfToDiff
DataFrame The DataFrame to diff.columnNames
(String | Array) The selected columns for the diff.
Examples
df2.diff(df2, 'id')
Returns DataFrame The differences DataFrame.
head
Create a new subset DataFrame based on the first rows.
Parameters
nRows
Number The number of first rows to get. (optional, default10
)
Examples
df2.head()
df2.head(5)
Returns DataFrame The subset DataFrame.
tail
Create a new subset DataFrame based on the last rows.
Parameters
nRows
Number The number of last rows to get. (optional, default10
)
Examples
df2.tail()
df2.tail(5)
Returns DataFrame The subset DataFrame.
slice
Create a new subset DataFrame based on given indexs. Similar to Array.slice.
Parameters
startIndex
Number The index to start the slice (included). (optional, default0
)endIndex
Number The index to end the slice (excluded). (optional, defaultthis.count()
)
Examples
df2.slice()
df2.slice(0)
df2.slice(0, 20)
df2.slice(10, 30)
Returns DataFrame The subset DataFrame.
getRow
Return a Row by its index.
Parameters
index
Number The index to select the row. (optional, default0
)
Examples
df2.getRow(1)
Returns Row The Row.
setRow
Modify a Row a the given index.
Parameters
index
Number The index to select the row. (optional, default0
)func
(optional, defaultrow=>row
)
Examples
df2.setRowByIndex(1, row => row.set("column1", 33))
Returns DataFrame A new DataFrame with the modified Row.
setDefaultModules
Set the default modules used in DataFrame instances.
Parameters
defaultModules
...Object DataFrame modules used by default.
Examples
DataFrame.setDefaultModules(SQL, Stat)
fromDSV
Create a DataFrame from a delimiter separated values text file. It returns a Promise.
Parameters
args
...anypathOrFile
(String | File) A path to the file (url or local) or a browser File object.sep
String The separator used to parse the file.header
Boolean A boolean indicating if the text has a header or not. (optional, defaulttrue
)
Examples
DataFrame.fromDSV('http://myurl/myfile.txt').then(df => df.show())
// In browser Only
DataFrame.fromDSV(myFile).then(df => df.show())
// From node.js only Only
DataFrame.fromDSV('/my/absolue/path/myfile.txt').then(df => df.show())
DataFrame.fromDSV('/my/absolue/path/myfile.txt', ';', true).then(df => df.show())
fromText
Create a DataFrame from a delimiter separated values text file. It returns a Promise. Alias of DataFrame.fromDSV.
Parameters
args
...anypathOrFile
(String | File) A path to the file (url or local) or a browser File object.sep
String The separator used to parse the file.header
Boolean A boolean indicating if the text has a header or not. (optional, defaulttrue
)
Examples
DataFrame.fromText('http://myurl/myfile.txt').then(df => df.show())
// In browser Only
DataFrame.fromText(myFile).then(df => df.show())
// From node.js only Only
DataFrame.fromText('/my/absolue/path/myfile.txt').then(df => df.show())
DataFrame.fromText('/my/absolue/path/myfile.txt', ';', true).then(df => df.show())
fromCSV
Create a DataFrame from a comma separated values file. It returns a Promise.
Parameters
args
...anypathOrFile
(String | File) A path to the file (url or local) or a browser File object.header
Boolean A boolean indicating if the csv has a header or not. (optional, defaulttrue
)
Examples
DataFrame.fromCSV('http://myurl/myfile.csv').then(df => df.show())
// For browser only
DataFrame.fromCSV(myFile).then(df => df.show())
// From node.js only
DataFrame.fromCSV('/my/absolue/path/myfile.csv').then(df => df.show())
DataFrame.fromCSV('/my/absolue/path/myfile.csv', true).then(df => df.show())
fromTSV
Create a DataFrame from a tab separated values file. It returns a Promise.
Parameters
args
...anypathOrFile
(String | File) A path to the file (url or local) or a browser File object.header
Boolean A boolean indicating if the tsv has a header or not. (optional, defaulttrue
)
Examples
DataFrame.fromTSV('http://myurl/myfile.tsv').then(df => df.show())
// For browser only
DataFrame.fromTSV(myFile).then(df => df.show())
// From node.js only
DataFrame.fromTSV('/my/absolue/path/myfile.tsv').then(df => df.show())
DataFrame.fromTSV('/my/absolue/path/myfile.tsv', true).then(df => df.show())
fromPSV
Create a DataFrame from a pipe separated values file. It returns a Promise.
Parameters
args
...anypathOrFile
(String | File) A path to the file (url or local) or a browser File object.header
Boolean A boolean indicating if the psv has a header or not. (optional, defaulttrue
)
Examples
DataFrame.fromPSV('http://myurl/myfile.psv').then(df => df.show())
// For browser only
DataFrame.fromPSV(myFile).then(df => df.show())
// From node.js only
DataFrame.fromPSV('/my/absolue/path/myfile.psv').then(df => df.show())
DataFrame.fromPSV('/my/absolue/path/myfile.psv', true).then(df => df.show())
fromJSON
Create a DataFrame from a JSON file. It returns a Promise.
Parameters
args
...anypathOrFile
(String | File) A path to the file (url or local) or a browser File object.
Examples
DataFrame.fromJSON('http://myurl/myfile.json').then(df => df.show())
// For browser only
DataFrame.fromJSON(myFile).then(df => df.show())
// From node.js only
DataFrame.fromJSON('/my/absolute/path/myfile.json').then(df => df.show())