Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Numpy char.split() Function



The Numpy char.split() function which is used to split each string element of an array into a list of substrings based on a specified delimiter.

By default the split() function splits on whitespace but we can provide a custom delimiter. This function is useful for tokenizing or parsing text data.

This function processes each string in the input array individually and returns an array of the same shape where each element is a list of substrings resulting from the split operation.

Syntax

Following is the syntax of Numpy char.split() function −

numpy.char.split(a, sep=None, maxsplit=-1)

Parameters

Following are the parameters of Numpy char.split() function −

  • a(array-like of str or unicode): The input array containing strings to be split.

  • sep(str, optional): The delimiter on which to split the strings. If not provided, the default is whitespace.

  • maxsplit(int, optional): The maximum number of splits to perform. If not provided or set to -1 then there is no limit on the number of splits.

Return Value

This function returns an array with the same shape as the input where each string element is replaced by a list of substrings resulting from the split operation.

Example 1

Following is the basic example of Numpy char.split() function in which each string in the input array is split into a list of substrings wherever there is whitespace. The resulting array contains lists of words extracted from each original string −

import numpy as np

arr = np.array(['apple banana cherry', 'date elderberry fig'])
split_arr = np.char.split(arr)
print(split_arr)

Below is the output of the basic example of numpy.char.split() function −

[list(['apple', 'banana', 'cherry']) list(['date', 'elderberry', 'fig'])]

Example 2

We can use the char.split() function, to split strings based on a custom delimiter. This allows for greater flexibility when parsing or tokenizing text data. Here in this example we are using the delimiter comma(',') to split the string in array −

import numpy as np

arr = np.array(['apple,banana,cherry', 'date,elderberry,fig'])
split_arr = np.char.split(arr, sep=',')
print(split_arr)

Here is the output of splitting with a custom delimiter −

[list(['apple', 'banana', 'cherry']) list(['date', 'elderberry', 'fig'])]

Example 3

We can use the maxsplit parameter in char.split() function to control the number of splits performed. This is useful when we want to limit the number of resulting substrings from each string element. Following is the example which explains the maxsplit parameter −

import numpy as np

arr = np.array(['one-two-three-four', 'five-six-seven'])
split_arr = np.char.split(arr, sep='-', maxsplit=2)
print(split_arr)

Here is the output of limiting the number of splits −

[list(['one', 'two', 'three-four']) list(['five', 'six', 'seven'])]
numpy_string_functions.htm
Advertisements