Description¶
This module can be used to handle comma-separated values (CSV) files and do lightweight processing of their data with support for row and column filtering. In addition to basic read, write and data replacement, files can be concatenated, merged, and sorted
Examples¶
Read/write¶
# pcsv_example_1.py
import pmisc, pcsv
def main():
with pmisc.TmpFile() as fname:
ref_data = [["Item", "Cost"], [1, 9.99], [2, 10000], [3, 0.10]]
# Write reference data to a file
pcsv.write(fname, ref_data, append=False)
# Read the data back
obj = pcsv.CsvFile(fname)
# After the object creation the I/O is done,
# can safely remove file (exit context manager)
# Check that data read is correct
assert obj.header() == ref_data[0]
assert obj.data() == ref_data[1:]
# Add a simple row filter, only look at rows that have
# values 1 and 3 in the "Items" column
obj.rfilter = {"Item": [1, 3]}
assert obj.data(filtered=True) == [ref_data[1], ref_data[3]]
if __name__ == "__main__":
main()
Replace data¶
# pcsv_example_2.py
import pmisc, pcsv
def main():
ctx = pmisc.TmpFile
with ctx() as fname1:
with ctx() as fname2:
with ctx() as ofname:
# Create first (input) data file
input_data = [["Item", "Cost"], [1, 9.99], [2, 10000], [3, 0.10]]
pcsv.write(fname1, input_data, append=False)
# Create second (replacement) data file
replacement_data = [
["Staff", "Rate", "Days"],
["Joe", 10, "Sunday"],
["Sue", 20, "Thursday"],
["Pat", 15, "Tuesday"],
]
pcsv.write(fname2, replacement_data, append=False)
# Replace "Cost" column of input file with "Rate" column
# of replacement file for "Items" 2 and 3 with "Staff" data
# from Joe and Pat. Save resulting data to another file
pcsv.replace(
fname1=fname1,
dfilter1=("Cost", {"Item": [1, 3]}),
fname2=fname2,
dfilter2=("Rate", {"Staff": ["Joe", "Pat"]}),
ofname=ofname,
)
# Verify that resulting file is correct
ref_data = [["Item", "Cost"], [1, 10], [2, 10000], [3, 15]]
obj = pcsv.CsvFile(ofname)
assert obj.header() == ref_data[0]
assert obj.data() == ref_data[1:]
if __name__ == "__main__":
main()
Concatenate two files¶
# pcsv_example_3.py
import pmisc, pcsv
def main():
ctx = pmisc.TmpFile
with ctx() as fname1:
with ctx() as fname2:
with ctx() as ofname:
# Create first data file
data1 = [[1, 9.99], [2, 10000], [3, 0.10]]
pcsv.write(fname1, data1, append=False)
# Create second data file
data2 = [
["Joe", 10, "Sunday"],
["Sue", 20, "Thursday"],
["Pat", 15, "Tuesday"],
]
pcsv.write(fname2, data2, append=False)
# Concatenate file1 and file2. Filter out
# second column of file2
pcsv.concatenate(
fname1=fname1,
fname2=fname2,
has_header1=False,
has_header2=False,
dfilter2=[0, 2],
ofname=ofname,
ocols=["D1", "D2"],
)
# Verify that resulting file is correct
ref_data = [
["D1", "D2"],
[1, 9.99],
[2, 10000],
[3, 0.10],
["Joe", "Sunday"],
["Sue", "Thursday"],
["Pat", "Tuesday"],
]
obj = pcsv.CsvFile(ofname)
assert obj.header() == ref_data[0]
assert obj.data() == ref_data[1:]
if __name__ == "__main__":
main()
Merge two files¶
# pcsv_example_4.py
import pmisc, pcsv
def main():
ctx = pmisc.TmpFile
with ctx() as fname1:
with ctx() as fname2:
with ctx() as ofname:
# Create first data file
data1 = [[1, 9.99], [2, 10000], [3, 0.10]]
pcsv.write(fname1, data1, append=False)
# Create second data file
data2 = [
["Joe", 10, "Sunday"],
["Sue", 20, "Thursday"],
["Pat", 15, "Tuesday"],
]
pcsv.write(fname2, data2, append=False)
# Merge file1 and file2
pcsv.merge(
fname1=fname1,
has_header1=False,
fname2=fname2,
has_header2=False,
ofname=ofname,
)
# Verify that resulting file is correct
ref_data = [
[1, 9.99, "Joe", 10, "Sunday"],
[2, 10000, "Sue", 20, "Thursday"],
[3, 0.10, "Pat", 15, "Tuesday"],
]
obj = pcsv.CsvFile(ofname, has_header=False)
assert obj.header() == list(range(0, 5))
assert obj.data() == ref_data
if __name__ == "__main__":
main()
Sort a file¶
# pcsv_example_5.py
import pmisc, pcsv
def main():
ctx = pmisc.TmpFile
with ctx() as ifname:
with ctx() as ofname:
# Create first data file
data = [
["Ctrl", "Ref", "Result"],
[1, 3, 10],
[1, 4, 20],
[2, 4, 30],
[2, 5, 40],
[3, 5, 50],
]
pcsv.write(ifname, data, append=False)
# Sort
pcsv.dsort(
fname=ifname,
order=[{"Ctrl": "D"}, {"Ref": "A"}],
has_header=True,
ofname=ofname,
)
# Verify that resulting file is correct
ref_data = [[3, 5, 50], [2, 4, 30], [2, 5, 40], [1, 3, 10], [1, 4, 20]]
obj = pcsv.CsvFile(ofname, has_header=True)
assert obj.header() == ["Ctrl", "Ref", "Result"]
assert obj.data() == ref_data
if __name__ == "__main__":
main()
Interpreter¶
The package has been developed and tested with Python 2.7, 3.5, 3.6 and 3.7 under Linux (Debian, Ubuntu), Apple macOS and Microsoft Windows
Installing¶
$ pip install pcsv
Documentation¶
Available at Read the Docs
Contributing¶
Abide by the adopted code of conduct
Fork the repository from GitHub and then clone personal copy [1]:
$ github_user=myname $ git clone --recurse-submodules \ https://github.com/"${github_user}"/pcsv.git Cloning into 'pcsv'... ... $ cd pcsv || exit 1 $ export PCSV_DIR=${PWD} $
The package uses two sub-modules: a set of custom Pylint plugins to help with some areas of code quality and consistency (under the
pylint_plugins
directory), and a lightweight package management framework (under thepypkg
directory). Additionally, the pre-commit framework is used to perform various pre-commit code quality and consistency checks. To enable the pre-commit hooks:$ cd "${PCSV_DIR}" || exit 1 $ pre-commit install pre-commit installed at .../pcsv/.git/hooks/pre-commit $
Ensure that the Python interpreter can find the package modules (update the
$PYTHONPATH
environment variable, or use sys.paths(), etc.)$ export PYTHONPATH=${PYTHONPATH}:${PCSV_DIR} $
Install the dependencies (if needed, done automatically by pip):
- Cog (2.5.1 or newer)
- Coverage (4.5.3 or newer)
- Docutils (0.14 or newer)
- Inline Syntax Highlight Sphinx Extension (0.2 or newer)
- Mock (Python 2.x only, 2.0.0 or newer)
- Pexdoc (1.1.4 or newer)
- Pmisc (1.5.8 or newer)
- Py.test (4.3.1 or newer)
- PyContracts (1.8.2 or newer)
- Pydocstyle (3.0.0 or newer)
- Pylint (Python 2.x: 1.9.4 or newer, Python 3.x: 2.3.1 or newer)
- Pytest-coverage (2.6.1 or newer)
- Pytest-pmisc (1.0.7 or newer)
- Pytest-xdist (optional, 1.26.1 or newer)
- ReadTheDocs Sphinx theme (0.4.3 or newer)
- Shellcheck Linter Sphinx Extension (1.0.8 or newer)
- Sphinx (1.8.5 or newer)
- Tox (3.7.0 or newer)
- Virtualenv (16.4.3 or newer)
Implement a new feature or fix a bug
Write a unit test which shows that the contributed code works as expected. Run the package tests to ensure that the bug fix or new feature does not have adverse side effects. If possible achieve 100% code and branch coverage of the contribution. Thorough package validation can be done via Tox and Pytest:
$ PKG_NAME=pcsv tox GLOB sdist-make: .../pcsv/setup.py py27-pkg create: .../pcsv/.tox/py27 py27-pkg installdeps: -r.../pcsv/requirements/tests_py27.pip, -r.../pcsv/requirements/docs_py27.pip ... py27-pkg: commands succeeded py35-pkg: commands succeeded py36-pkg: commands succeeded py37-pkg: commands succeeded congratulations :) $
Setuptools can also be used (Tox is configured as its virtual environment manager):
$ PKG_NAME=pcsv python setup.py tests running tests running egg_info writing pcsv.egg-info/PKG-INFO writing dependency_links to pcsv.egg-info/dependency_links.txt writing requirements to pcsv.egg-info/requires.txt ... py27-pkg: commands succeeded py35-pkg: commands succeeded py36-pkg: commands succeeded py37-pkg: commands succeeded congratulations :) $
Tox (or Setuptools via Tox) runs with the following default environments:
py27-pkg
,py35-pkg
,py36-pkg
andpy37-pkg
[3]. These use the 2.7, 3.5, 3.6 and 3.7 interpreters, respectively, to test all code in the documentation (both in Sphinx*.rst
source files and in docstrings), run all unit tests, measure test coverage and re-build the exceptions documentation. To pass arguments to Pytest (the test runner) use a double dash (--
) after all the Tox arguments, for example:$ PKG_NAME=pcsv tox -e py27-pkg -- -n 4 GLOB sdist-make: .../pcsv/setup.py py27-pkg inst-nodeps: .../pcsv/.tox/.tmp/package/1/pcsv-1.0.8.zip ... py27-pkg: commands succeeded congratulations :) $
Or use the
-a
Setuptools optional argument followed by a quoted string with the arguments for Pytest. For example:$ PKG_NAME=pcsv python setup.py tests -a "-e py27-pkg -- -n 4" running tests ... py27-pkg: commands succeeded congratulations :) $
There are other convenience environments defined for Tox [3]:
py27-repl
,py35-repl
,py36-repl
andpy37-repl
run the Python 2.7, 3.5, 3.6 and 3.7 REPL, respectively, in the appropriate virtual environment. Thepcsv
package is pip-installed by Tox when the environments are created. Arguments to the interpreter can be passed in the command line after a double dash (--
).py27-test
,py35-test
,py36-test
andpy37-test
run Pytest using the Python 2.7, 3.5, 3.6 and 3.7 interpreter, respectively, in the appropriate virtual environment. Arguments to pytest can be passed in the command line after a double dash (--
) , for example:$ PKG_NAME=pcsv tox -e py27-test -- -x test_pcsv.py GLOB sdist-make: .../pcsv/setup.py py27-pkg inst-nodeps: .../pcsv/.tox/.tmp/package/1/pcsv-1.0.8.zip ... py27-pkg: commands succeeded congratulations :) $
py27-test
,py35-test
,py36-test
andpy37-test
test code and branch coverage using the 2.7, 3.5, 3.6 and 3.7 interpreter, respectively, in the appropriate virtual environment. Arguments to pytest can be passed in the command line after a double dash (--
). The report can be found in${PCSV_DIR}/.tox/py[PV]/usr/share/pcsv/tests/htmlcov/index.html
where[PV]
stands for2.7
,3.5
,3.6
or3.7
depending on the interpreter used.
Verify that continuous integration tests pass. The package has continuous integration configured for Linux, Apple macOS and Microsoft Windows (all via Azure DevOps).
Document the new feature or bug fix (if needed). The script
${PCSV_DIR}/pypkg/build_docs.py
re-builds the whole package documentation (re-generates images, cogs source files, etc.):$ "${PCSV_DIR}"/pypkg/build_docs.py -h usage: build_docs.py [-h] [-d DIRECTORY] [-r] [-n NUM_CPUS] [-t] Build pcsv package documentation optional arguments: -h, --help show this help message and exit -d DIRECTORY, --directory DIRECTORY specify source file directory (default ../pcsv) -r, --rebuild rebuild exceptions documentation. If no module name is given all modules with auto-generated exceptions documentation are rebuilt -n NUM_CPUS, --num-cpus NUM_CPUS number of CPUs to use (default: 1) -t, --test diff original and rebuilt file(s) (exit code 0 indicates file(s) are identical, exit code 1 indicates file(s) are different)
Footnotes
[1] | All examples are for the bash shell |
[2] | It is assumed that all the Python interpreters are in the executables path. Source code for the interpreters can be downloaded from Python’s main site |
[3] | (1, 2) Tox configuration largely inspired by Ionel’s codelog |
Changelog¶
- 1.0.8 [2019-03-22]: Documentation and dependency update
- 1.0.7 [2019-03-09]: Dropped support for Python 2.6, 3.3 and 3.4. Updates to support newest versions of dependencies. Adopted lightweight package management framework
- 1.0.6 [2017-09-10]: Fixed bug while filtering rows that have empty column specified in filter. Fixed broken multi-line links in documentation
- 1.0.5 [2017-02-10]: Package build enhancements and fixes
- 1.0.4 [2017-02-07]: Python 3.6 support
- 1.0.3 [2016-06-10]: Minor documentation build bug fix
- 1.0.2 [2016-05-12]: Minor documentation updates
- 1.0.1 [2016-05-12]: Minor documentation updates
- 1.0.0 [2016-05-12]: Final release of 1.0.0 branch
- 1.0.0rc1 [2016-05-11]: Initial commit, forked a subset from putil PyPI package
License¶
The MIT License (MIT)
Copyright (c) 2013-2019 Pablo Acosta-Serafini
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Contents¶
Description¶
This module can be used to handle comma-separated values (CSV) files and do lightweight processing of their data with support for row and column filtering. In addition to basic read, write and data replacement, files can be concatenated, merged, and sorted
Examples¶
Read/write¶
# pcsv_example_1.py
import pmisc, pcsv
def main():
with pmisc.TmpFile() as fname:
ref_data = [["Item", "Cost"], [1, 9.99], [2, 10000], [3, 0.10]]
# Write reference data to a file
pcsv.write(fname, ref_data, append=False)
# Read the data back
obj = pcsv.CsvFile(fname)
# After the object creation the I/O is done,
# can safely remove file (exit context manager)
# Check that data read is correct
assert obj.header() == ref_data[0]
assert obj.data() == ref_data[1:]
# Add a simple row filter, only look at rows that have
# values 1 and 3 in the "Items" column
obj.rfilter = {"Item": [1, 3]}
assert obj.data(filtered=True) == [ref_data[1], ref_data[3]]
if __name__ == "__main__":
main()
Replace data¶
# pcsv_example_2.py
import pmisc, pcsv
def main():
ctx = pmisc.TmpFile
with ctx() as fname1:
with ctx() as fname2:
with ctx() as ofname:
# Create first (input) data file
input_data = [["Item", "Cost"], [1, 9.99], [2, 10000], [3, 0.10]]
pcsv.write(fname1, input_data, append=False)
# Create second (replacement) data file
replacement_data = [
["Staff", "Rate", "Days"],
["Joe", 10, "Sunday"],
["Sue", 20, "Thursday"],
["Pat", 15, "Tuesday"],
]
pcsv.write(fname2, replacement_data, append=False)
# Replace "Cost" column of input file with "Rate" column
# of replacement file for "Items" 2 and 3 with "Staff" data
# from Joe and Pat. Save resulting data to another file
pcsv.replace(
fname1=fname1,
dfilter1=("Cost", {"Item": [1, 3]}),
fname2=fname2,
dfilter2=("Rate", {"Staff": ["Joe", "Pat"]}),
ofname=ofname,
)
# Verify that resulting file is correct
ref_data = [["Item", "Cost"], [1, 10], [2, 10000], [3, 15]]
obj = pcsv.CsvFile(ofname)
assert obj.header() == ref_data[0]
assert obj.data() == ref_data[1:]
if __name__ == "__main__":
main()
Concatenate two files¶
# pcsv_example_3.py
import pmisc, pcsv
def main():
ctx = pmisc.TmpFile
with ctx() as fname1:
with ctx() as fname2:
with ctx() as ofname:
# Create first data file
data1 = [[1, 9.99], [2, 10000], [3, 0.10]]
pcsv.write(fname1, data1, append=False)
# Create second data file
data2 = [
["Joe", 10, "Sunday"],
["Sue", 20, "Thursday"],
["Pat", 15, "Tuesday"],
]
pcsv.write(fname2, data2, append=False)
# Concatenate file1 and file2. Filter out
# second column of file2
pcsv.concatenate(
fname1=fname1,
fname2=fname2,
has_header1=False,
has_header2=False,
dfilter2=[0, 2],
ofname=ofname,
ocols=["D1", "D2"],
)
# Verify that resulting file is correct
ref_data = [
["D1", "D2"],
[1, 9.99],
[2, 10000],
[3, 0.10],
["Joe", "Sunday"],
["Sue", "Thursday"],
["Pat", "Tuesday"],
]
obj = pcsv.CsvFile(ofname)
assert obj.header() == ref_data[0]
assert obj.data() == ref_data[1:]
if __name__ == "__main__":
main()
Merge two files¶
# pcsv_example_4.py
import pmisc, pcsv
def main():
ctx = pmisc.TmpFile
with ctx() as fname1:
with ctx() as fname2:
with ctx() as ofname:
# Create first data file
data1 = [[1, 9.99], [2, 10000], [3, 0.10]]
pcsv.write(fname1, data1, append=False)
# Create second data file
data2 = [
["Joe", 10, "Sunday"],
["Sue", 20, "Thursday"],
["Pat", 15, "Tuesday"],
]
pcsv.write(fname2, data2, append=False)
# Merge file1 and file2
pcsv.merge(
fname1=fname1,
has_header1=False,
fname2=fname2,
has_header2=False,
ofname=ofname,
)
# Verify that resulting file is correct
ref_data = [
[1, 9.99, "Joe", 10, "Sunday"],
[2, 10000, "Sue", 20, "Thursday"],
[3, 0.10, "Pat", 15, "Tuesday"],
]
obj = pcsv.CsvFile(ofname, has_header=False)
assert obj.header() == list(range(0, 5))
assert obj.data() == ref_data
if __name__ == "__main__":
main()
Sort a file¶
# pcsv_example_5.py
import pmisc, pcsv
def main():
ctx = pmisc.TmpFile
with ctx() as ifname:
with ctx() as ofname:
# Create first data file
data = [
["Ctrl", "Ref", "Result"],
[1, 3, 10],
[1, 4, 20],
[2, 4, 30],
[2, 5, 40],
[3, 5, 50],
]
pcsv.write(ifname, data, append=False)
# Sort
pcsv.dsort(
fname=ifname,
order=[{"Ctrl": "D"}, {"Ref": "A"}],
has_header=True,
ofname=ofname,
)
# Verify that resulting file is correct
ref_data = [[3, 5, 50], [2, 4, 30], [2, 5, 40], [1, 3, 10], [1, 4, 20]]
obj = pcsv.CsvFile(ofname, has_header=True)
assert obj.header() == ["Ctrl", "Ref", "Result"]
assert obj.data() == ref_data
if __name__ == "__main__":
main()
Interpreter¶
The package has been developed and tested with Python 2.7, 3.5, 3.6 and 3.7 under Linux (Debian, Ubuntu), Apple macOS and Microsoft Windows
Installing¶
$ pip install pcsv
Documentation¶
Available at Read the Docs
Contributing¶
Abide by the adopted code of conduct
Fork the repository from GitHub and then clone personal copy [1]:
$ github_user=myname $ git clone --recurse-submodules \ https://github.com/"${github_user}"/pcsv.git Cloning into 'pcsv'... ... $ cd pcsv || exit 1 $ export PCSV_DIR=${PWD} $
The package uses two sub-modules: a set of custom Pylint plugins to help with some areas of code quality and consistency (under the
pylint_plugins
directory), and a lightweight package management framework (under thepypkg
directory). Additionally, the pre-commit framework is used to perform various pre-commit code quality and consistency checks. To enable the pre-commit hooks:$ cd "${PCSV_DIR}" || exit 1 $ pre-commit install pre-commit installed at .../pcsv/.git/hooks/pre-commit $
Ensure that the Python interpreter can find the package modules (update the
$PYTHONPATH
environment variable, or use sys.paths(), etc.)$ export PYTHONPATH=${PYTHONPATH}:${PCSV_DIR} $
Install the dependencies (if needed, done automatically by pip):
- Cog (2.5.1 or newer)
- Coverage (4.5.3 or newer)
- Docutils (0.14 or newer)
- Inline Syntax Highlight Sphinx Extension (0.2 or newer)
- Mock (Python 2.x only, 2.0.0 or newer)
- Pexdoc (1.1.4 or newer)
- Pmisc (1.5.8 or newer)
- Py.test (4.3.1 or newer)
- PyContracts (1.8.2 or newer)
- Pydocstyle (3.0.0 or newer)
- Pylint (Python 2.x: 1.9.4 or newer, Python 3.x: 2.3.1 or newer)
- Pytest-coverage (2.6.1 or newer)
- Pytest-pmisc (1.0.7 or newer)
- Pytest-xdist (optional, 1.26.1 or newer)
- ReadTheDocs Sphinx theme (0.4.3 or newer)
- Shellcheck Linter Sphinx Extension (1.0.8 or newer)
- Sphinx (1.8.5 or newer)
- Tox (3.7.0 or newer)
- Virtualenv (16.4.3 or newer)
Implement a new feature or fix a bug
Write a unit test which shows that the contributed code works as expected. Run the package tests to ensure that the bug fix or new feature does not have adverse side effects. If possible achieve 100% code and branch coverage of the contribution. Thorough package validation can be done via Tox and Pytest:
$ PKG_NAME=pcsv tox GLOB sdist-make: .../pcsv/setup.py py27-pkg create: .../pcsv/.tox/py27 py27-pkg installdeps: -r.../pcsv/requirements/tests_py27.pip, -r.../pcsv/requirements/docs_py27.pip ... py27-pkg: commands succeeded py35-pkg: commands succeeded py36-pkg: commands succeeded py37-pkg: commands succeeded congratulations :) $
Setuptools can also be used (Tox is configured as its virtual environment manager):
$ PKG_NAME=pcsv python setup.py tests running tests running egg_info writing pcsv.egg-info/PKG-INFO writing dependency_links to pcsv.egg-info/dependency_links.txt writing requirements to pcsv.egg-info/requires.txt ... py27-pkg: commands succeeded py35-pkg: commands succeeded py36-pkg: commands succeeded py37-pkg: commands succeeded congratulations :) $
Tox (or Setuptools via Tox) runs with the following default environments:
py27-pkg
,py35-pkg
,py36-pkg
andpy37-pkg
[3]. These use the 2.7, 3.5, 3.6 and 3.7 interpreters, respectively, to test all code in the documentation (both in Sphinx*.rst
source files and in docstrings), run all unit tests, measure test coverage and re-build the exceptions documentation. To pass arguments to Pytest (the test runner) use a double dash (--
) after all the Tox arguments, for example:$ PKG_NAME=pcsv tox -e py27-pkg -- -n 4 GLOB sdist-make: .../pcsv/setup.py py27-pkg inst-nodeps: .../pcsv/.tox/.tmp/package/1/pcsv-1.0.8.zip ... py27-pkg: commands succeeded congratulations :) $
Or use the
-a
Setuptools optional argument followed by a quoted string with the arguments for Pytest. For example:$ PKG_NAME=pcsv python setup.py tests -a "-e py27-pkg -- -n 4" running tests ... py27-pkg: commands succeeded congratulations :) $
There are other convenience environments defined for Tox [3]:
py27-repl
,py35-repl
,py36-repl
andpy37-repl
run the Python 2.7, 3.5, 3.6 and 3.7 REPL, respectively, in the appropriate virtual environment. Thepcsv
package is pip-installed by Tox when the environments are created. Arguments to the interpreter can be passed in the command line after a double dash (--
).py27-test
,py35-test
,py36-test
andpy37-test
run Pytest using the Python 2.7, 3.5, 3.6 and 3.7 interpreter, respectively, in the appropriate virtual environment. Arguments to pytest can be passed in the command line after a double dash (--
) , for example:$ PKG_NAME=pcsv tox -e py27-test -- -x test_pcsv.py GLOB sdist-make: .../pcsv/setup.py py27-pkg inst-nodeps: .../pcsv/.tox/.tmp/package/1/pcsv-1.0.8.zip ... py27-pkg: commands succeeded congratulations :) $
py27-test
,py35-test
,py36-test
andpy37-test
test code and branch coverage using the 2.7, 3.5, 3.6 and 3.7 interpreter, respectively, in the appropriate virtual environment. Arguments to pytest can be passed in the command line after a double dash (--
). The report can be found in${PCSV_DIR}/.tox/py[PV]/usr/share/pcsv/tests/htmlcov/index.html
where[PV]
stands for2.7
,3.5
,3.6
or3.7
depending on the interpreter used.
Verify that continuous integration tests pass. The package has continuous integration configured for Linux, Apple macOS and Microsoft Windows (all via Azure DevOps).
Document the new feature or bug fix (if needed). The script
${PCSV_DIR}/pypkg/build_docs.py
re-builds the whole package documentation (re-generates images, cogs source files, etc.):$ "${PCSV_DIR}"/pypkg/build_docs.py -h usage: build_docs.py [-h] [-d DIRECTORY] [-r] [-n NUM_CPUS] [-t] Build pcsv package documentation optional arguments: -h, --help show this help message and exit -d DIRECTORY, --directory DIRECTORY specify source file directory (default ../pcsv) -r, --rebuild rebuild exceptions documentation. If no module name is given all modules with auto-generated exceptions documentation are rebuilt -n NUM_CPUS, --num-cpus NUM_CPUS number of CPUs to use (default: 1) -t, --test diff original and rebuilt file(s) (exit code 0 indicates file(s) are identical, exit code 1 indicates file(s) are different)
Footnotes
[1] | All examples are for the bash shell |
[2] | It is assumed that all the Python interpreters are in the executables path. Source code for the interpreters can be downloaded from Python’s main site |
[3] | (1, 2) Tox configuration largely inspired by Ionel’s codelog |
Changelog¶
- 1.0.8 [2019-03-22]: Documentation and dependency update
- 1.0.7 [2019-03-09]: Dropped support for Python 2.6, 3.3 and 3.4. Updates to support newest versions of dependencies. Adopted lightweight package management framework
- 1.0.6 [2017-09-10]: Fixed bug while filtering rows that have empty column specified in filter. Fixed broken multi-line links in documentation
- 1.0.5 [2017-02-10]: Package build enhancements and fixes
- 1.0.4 [2017-02-07]: Python 3.6 support
- 1.0.3 [2016-06-10]: Minor documentation build bug fix
- 1.0.2 [2016-05-12]: Minor documentation updates
- 1.0.1 [2016-05-12]: Minor documentation updates
- 1.0.0 [2016-05-12]: Final release of 1.0.0 branch
- 1.0.0rc1 [2016-05-11]: Initial commit, forked a subset from putil PyPI package
License¶
The MIT License (MIT)
Copyright (c) 2013-2019 Pablo Acosta-Serafini
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
API¶
Identifying (filtering) columns¶
Several class methods and functions in this module allow column and row filtering of the CSV file data. It is necessary to identify columns for both of these operations and how these columns can be identified depends on whether the file has or does not have a header as indicated by the has_header boolean constructor argument:
- If has_header is
True
the first line of the file is taken as the header. Columns can be identified by name (a string that has to match a column value in the file header) or by number (an integer representing the column number with column zero being the leftmost column) - If has_header is
False
columns can only be identified by number (an integer representing the column number with column zero being the leftmost column)
For example, if a file myfile.csv
has the following data:
Ctrl | Ref | Result |
---|---|---|
1 | 3 | 10 |
1 | 4 | 20 |
2 | 4 | 30 |
2 | 5 | 40 |
3 | 5 | 50 |
Then when the file is loaded with
pcsv.CsvFile('myfile.csv', has_header=True)
the columns can be
referred to as 'Ctrl'
or 0
, 'Ref'
or 1
, or
'Result'
or 2
. However if the file is loaded with
pcsv.CsvFile('myfile.csv', has_header=False)
the columns can
be referred only as 0
, 1
or 2
.
Filtering rows¶
Several class methods and functions of this module allow row filtering of the CSV file data. The row filter is described in the CsvRowFilter pseudo-type
Swapping or inserting columns¶
The column filter not only filters columns but also determines the order in which the columns are stored internally in an pcsv.CsvFile object. This means that the column filter can be used to reorder and/or duplicate columns. For example:
# pcsv_example_6.py
import pmisc, pcsv
def main():
ctx = pmisc.TmpFile
with ctx() as ifname:
with ctx() as ofname:
# Create input data file
data = [
["Ctrl", "Ref", "Result"],
[1, 3, 10],
[1, 4, 20],
[2, 4, 30],
[2, 5, 40],
[3, 5, 50],
]
pcsv.write(ifname, data, append=False)
# Swap 'Ctrl' and 'Result' columns, duplicate
# 'Ref' column at the end
obj = pcsv.CsvFile(fname=ifname, dfilter=["Result", "Ref", "Ctrl", 1])
assert obj.header(filtered=False) == ["Ctrl", "Ref", "Result"]
assert obj.header(filtered=True) == ["Result", "Ref", "Ctrl", "Ref"]
obj.write(
ofname,
header=["Result", "Ref", "Ctrl", "Ref2"],
filtered=True,
append=False,
)
# Verify that resulting file is correct
ref_data = [
[10, 3, 1, 3],
[20, 4, 1, 4],
[30, 4, 2, 4],
[40, 5, 2, 5],
[50, 5, 3, 5],
]
obj = pcsv.CsvFile(ofname, has_header=True)
assert obj.header() == ["Result", "Ref", "Ctrl", "Ref2"]
assert obj.data() == ref_data
if __name__ == "__main__":
main()
Empty columns¶
When a file has empty columns they are read as None
. Conversely
any column value that is None
is written as an empty column.
Empty columns are ones that have either an empty string (''
)
or literally no information between the column delimiters (,
)
For example, if a file myfile2.csv
has the following data:
Ctrl | Ref | Result |
---|---|---|
1 | 4 | 20 |
2 | 30 | |
2 | 5 | |
5 | 50 |
The corresponding read array is:
[
['Ctrl', 'Ref', 'Result'],
[1, 4, 20],
[2, None, 30],
[2, 5, None],
[None, 5, 50]
]
Functions¶
-
pcsv.
concatenate
(fname1, fname2, dfilter1=None, dfilter2=None, has_header1=True, has_header2=True, frow1=0, frow2=0, ofname=None, ocols=None)¶ Concatenate two comma-separated values file.
Data rows from the second file are appended at the end of the data rows from the first file
Parameters: - fname1 (FileNameExists) – Name of the first comma-separated values file, the file whose data appears first in the output file
- fname2 (FileNameExists) – Name of the second comma-separated values file, the file whose data appears last in the output file
- dfilter1 (CsvDataFilter or None) – Row and/or column filter for the first file. If None no data filtering is done on the file
- dfilter2 (CsvDataFilter or None) – Row and/or column filter for the second file. If None no data filtering is done on the file
- has_header1 (boolean) – Flag that indicates whether the first comma-separated values file has column headers in its first line (True) or not (False)
- has_header2 (boolean) – Flag that indicates whether the second comma-separated values file has column headers in its first line (True) or not (False)
- frow1 (NonNegativeInteger) – First comma-separated values file first data row (starting from 1). If 0 the row where data starts is auto-detected as the first row that has a number (integer of float) in at least one of its columns
- frow2 (NonNegativeInteger) – Second comma-separated values file first data row (starting from 1). If 0 the row where data starts is auto-detected as the first row that has a number (integer of float) in at least one of its columns
- ofname (FileName or None) – Name of the output comma-separated values file, the file that will contain the data from the first and second files. If None the first file is replaced “in place”
- ocols (list or None) – Column names of the output comma-separated values file. If None the column names in the first file are used if has_header1 is True or the column names in the second files are used if has_header1 is False and has_header2 is True, otherwise no header is used
Raises: - OSError (File [fname] could not be found)
- RuntimeError (Argument `dfilter1` is not valid)
- RuntimeError (Argument `dfilter2` is not valid)
- RuntimeError (Argument `fname1` is not valid)
- RuntimeError (Argument `fname2` is not valid)
- RuntimeError (Argument `frow1` is not valid)
- RuntimeError (Argument `frow2` is not valid)
- RuntimeError (Argument `ocols` is not valid)
- RuntimeError (Argument `ofname` is not valid)
- RuntimeError (Column headers are not unique in file [fname])
- RuntimeError (File [fname] has no valid data)
- RuntimeError (File [fname] is empty)
- RuntimeError (Files have different number of columns)
- RuntimeError (Invalid column specification)
- RuntimeError (Number of columns in data files and output columns are different)
- ValueError (Column [column_identifier] not found)
-
pcsv.
dsort
(fname, order, has_header=True, frow=0, ofname=None)¶ Sort file data.
Parameters: - fname (FileNameExists) – Name of the comma-separated values file to sort
- order (CsvColFilter) – Sort order
- has_header (boolean) – Flag that indicates whether the comma-separated values file to sort has column headers in its first line (True) or not (False)
- frow (NonNegativeInteger) – First data row (starting from 1). If 0 the row where data starts is auto-detected as the first row that has a number (integer of float) in at least one of its columns
- ofname (FileName or None) – Name of the output comma-separated values file, the file that will contain the sorted data. If None the sorting is done “in place”
Raises: - OSError (File [fname] could not be found)
- RuntimeError (Argument `fname` is not valid)
- RuntimeError (Argument `frow` is not valid)
- RuntimeError (Argument `has_header` is not valid)
- RuntimeError (Argument `ofname` is not valid)
- RuntimeError (Argument `order` is not valid)
- RuntimeError (Column headers are not unique in file [fname])
- RuntimeError (File [fname] has no valid data)
- RuntimeError (File [fname] is empty)
- RuntimeError (Invalid column specification)
- ValueError (Column [column_identifier] not found)
-
pcsv.
merge
(fname1, fname2, dfilter1=None, dfilter2=None, has_header1=True, has_header2=True, frow1=0, frow2=0, ofname=None, ocols=None)¶ Merge two comma-separated values files.
Data columns from the second file are appended after data columns from the first file. Empty values in columns are used if the files have different number of rows
Parameters: - fname1 (FileNameExists) – Name of the first comma-separated values file, the file whose columns appear first in the output file
- fname2 (FileNameExists) – Name of the second comma-separated values file, the file whose columns appear last in the output file
- dfilter1 (CsvDataFilter or None) – Row and/or column filter for the first file. If None no data filtering is done on the file
- dfilter2 (CsvDataFilter or None) – Row and/or column filter for the second file. If None no data filtering is done on the file
- has_header1 (boolean) – Flag that indicates whether the first comma-separated values file has column headers in its first line (True) or not (False)
- has_header2 (boolean) – Flag that indicates whether the second comma-separated values file has column headers in its first line (True) or not (False)
- frow1 (NonNegativeInteger) – First comma-separated values file first data row (starting from 1). If 0 the row where data starts is auto-detected as the first row that has a number (integer of float) in at least one of its columns
- frow2 (NonNegativeInteger) – Second comma-separated values file first data row (starting from 1). If 0 the row where data starts is auto-detected as the first row that has a number (integer of float) in at least one of its columns
- ofname (FileName or None) – Name of the output comma-separated values file, the file that will contain the data from the first and second files. If None the first file is replaced “in place”
- ocols (list or None) – Column names of the output comma-separated values file.
If None the column names in the first and second files are
used if has_header1 and/or has_header2 are True. The
column labels
'Column [column_number]'
are used when one of the two files does not have a header, where[column_number]
is an integer representing the column number (column 0 is the leftmost column). No header is used if has_header1 and has_header2 are False
Raises: - OSError (File [fname] could not be found)
- RuntimeError (Argument `dfilter1` is not valid)
- RuntimeError (Argument `dfilter2` is not valid)
- RuntimeError (Argument `fname1` is not valid)
- RuntimeError (Argument `fname2` is not valid)
- RuntimeError (Argument `frow1` is not valid)
- RuntimeError (Argument `frow2` is not valid)
- RuntimeError (Argument `ocols` is not valid)
- RuntimeError (Argument `ofname` is not valid)
- RuntimeError (Column headers are not unique in file [fname])
- RuntimeError (Combined columns in data files and output columns are different)
- RuntimeError (File [fname] has no valid data)
- RuntimeError (File [fname] is empty)
- RuntimeError (Invalid column specification)
- ValueError (Column [column_identifier] not found)
-
pcsv.
replace
(fname1, fname2, dfilter1, dfilter2, has_header1=True, has_header2=True, frow1=0, frow2=0, ofname=None, ocols=None)¶ Replace data in one file with data from another file.
Parameters: - fname1 (FileNameExists) – Name of the input comma-separated values file, the file that contains the columns to be replaced
- fname2 (FileNameExists) – Name of the replacement comma-separated values file, the file that contains the replacement data
- dfilter1 (CsvDataFilter) – Row and/or column filter for the input file
- dfilter2 (CsvDataFilter) – Row and/or column filter for the replacement file
- has_header1 (boolean) – Flag that indicates whether the input comma-separated values file has column headers in its first line (True) or not (False)
- has_header2 (boolean) – Flag that indicates whether the replacement comma-separated values file has column headers in its first line (True) or not (False)
- frow1 (NonNegativeInteger) – Input comma-separated values file first data row (starting from 1). If 0 the row where data starts is auto-detected as the first row that has a number (integer of float) in at least one of its columns
- frow2 (NonNegativeInteger) – Replacement comma-separated values file first data row (starting from 1). If 0 the row where data starts is auto-detected as the first row that has a number (integer of float) in at least one of its columns
- ofname (FileName) – Name of the output comma-separated values file, the file that will contain the input file data but with some columns replaced with data from the replacement file. If None the input file is replaced “in place”
- ocols (list or None) – Names of the replaced columns in the output comma-separated values file. If None the column names in the input file are used if has_header1 is True, otherwise no header is used
Raises: - OSError (File [fname] could not be found)
- RuntimeError (Argument `dfilter1` is not valid)
- RuntimeError (Argument `dfilter2` is not valid)
- RuntimeError (Argument `fname1` is not valid)
- RuntimeError (Argument `fname2` is not valid)
- RuntimeError (Argument `frow1` is not valid)
- RuntimeError (Argument `frow2` is not valid)
- RuntimeError (Argument `ocols` is not valid)
- RuntimeError (Argument `ofname` is not valid)
- RuntimeError (Column headers are not unique in file [fname])
- RuntimeError (File [fname] has no valid data)
- RuntimeError (File [fname] is empty)
- RuntimeError (Invalid column specification)
- RuntimeError (Number of input and output columns are different)
- RuntimeError (Number of input and replacement columns are different)
- ValueError (Column [column_identifier] not found)
- ValueError (Number of rows mismatch between input and replacement data)
-
pcsv.
write
(fname, data, append=True)¶ Write data to a specified comma-separated values (CSV) file.
Parameters: - fname (FileName) – Name of the comma-separated values file to be written
- data (list) – Data to write to the file. Each item in this argument should contain a sub-list corresponding to a row of data; each item in the sub-lists should contain data corresponding to a particular column
- append (boolean) – Flag that indicates whether data is added to an existing file (or a new file is created if it does not exist) (True), or whether data overwrites the file contents (if the file exists) or creates a new file if the file does not exists (False)
Raises: - OSError (File [fname] could not be created: [reason])
- RuntimeError (Argument `append` is not valid)
- RuntimeError (Argument `data` is not valid)
- RuntimeError (Argument `fname` is not valid)
- ValueError (There is no data to save to file)
Class¶
-
class
pcsv.
CsvFile
(fname, dfilter=None, has_header=True, frow=0)¶ Bases: object
Process comma-separated values (CSV) files.
Parameters: - fname (FileNameExists) – Name of the comma-separated values file to read
- dfilter (CsvDataFilter or None) – Row and/or column filter. If None no data filtering is done
- has_header (boolean) – Flag that indicates whether the comma-separated values file has column headers in its first line (True) o not (False)
- frow (NonNegativeInteger) – First data row (starting from 1). If 0 the row where data starts is auto-detected as the first row that has a number (integer of float) in at least one of its columns
Return type: pcsv.CsvFile object
Raises: - OSError (File [fname] could not be found)
- RuntimeError (Argument `dfilter` is not valid)
- RuntimeError (Argument `fname` is not valid)
- RuntimeError (Argument `frow` is not valid)
- RuntimeError (Argument `has_header` is not valid)
- RuntimeError (Column headers are not unique in file [fname])
- RuntimeError (File [fname] has no valid data)
- RuntimeError (File [fname] is empty)
- RuntimeError (Invalid column specification)
- ValueError (Column [column_identifier] not found)
-
__eq__
(other)¶ Test object equality.
For example:
>>> import pmisc, pcsv >>> with pmisc.TmpFile() as fname: ... pcsv.write(fname, [['a'], [1]], append=False) ... obj1 = pcsv.CsvFile(fname, dfilter='a') ... obj2 = pcsv.CsvFile(fname, dfilter='a') ... >>> with pmisc.TmpFile() as fname: ... pcsv.write(fname, [['a'], [2]], append=False) ... obj3 = pcsv.CsvFile(fname, dfilter='a') ... >>> obj1 == obj2 True >>> obj1 == obj3 False >>> 5 == obj3 False
-
__repr__
()¶ Return a string with the expression needed to re-create the object.
For example:
>>> import pmisc, pcsv >>> with pmisc.TmpFile() as fname: ... pcsv.write(fname, [['a'], [1]], append=False) ... obj1 = pcsv.CsvFile(fname, dfilter='a') ... exec("obj2="+repr(obj1)) >>> obj1 == obj2 True >>> repr(obj1) "pcsv.CsvFile(fname=r'...', dfilter=['a'])"
-
add_dfilter
(dfilter)¶ Add more row(s) or column(s) to the existing data filter.
Duplicate filter values are eliminated
Parameters: dfilter (CsvDataFilter) – Row and/or column filter Raises: - RuntimeError (Argument `dfilter` is not valid)
- RuntimeError (Invalid column specification)
- ValueError (Column [column_identifier] not found)
-
cols
(filtered=False)¶ Return the number of data columns.
Parameters: filtered (boolean) – Flag that indicates whether the raw (input) data should be used (False) or whether filtered data should be used (True) Raises: RuntimeError (Argument `filtered` is not valid)
-
data
(filtered=False, no_empty=False)¶ Return (filtered) file data.
The returned object is a list, each item is a sub-list corresponding to a row of data; each item in the sub-lists contains data corresponding to a particular column
Parameters: - filtered (CsvFiltered) – Filtering type
- no_empty (bool) – Flag that indicates whether rows with empty columns should be filtered out (True) or not (False)
Return type: list
Raises: - RuntimeError (Argument `filtered` is not valid)
- RuntimeError (Argument `no_empty` is not valid)
-
dsort
(order)¶ Sort rows.
Parameters: order (CsvColFilter) – Sort order Raises: - RuntimeError (Argument `order` is not valid)
- RuntimeError (Invalid column specification)
- ValueError (Column [column_identifier] not found)
-
header
(filtered=False)¶ Return data header.
When the raw (input) data is used the data header is a list of the comma-separated values file header if the file is loaded with header (each list item is a column header) or a list of column numbers if the file is loaded without header (column zero is the leftmost column). When filtered data is used the data header is the active column filter, if any, otherwise it is the same as the raw (input) data header
Parameters: filtered (boolean) – Flag that indicates whether the raw (input) data should be used (False) or whether filtered data should be used (True) Return type: list of strings or integers Raises: RuntimeError (Argument `filtered` is not valid)
-
replace
(rdata, filtered=False)¶ Replace data.
Parameters: - rdata (list of lists) – Replacement data
- filtered (CsvFiltered) – Filtering type
Raises: - RuntimeError (Argument `filtered` is not valid)
- RuntimeError (Argument `rdata` is not valid)
- ValueError (Number of columns mismatch between input and replacement data)
- ValueError (Number of rows mismatch between input and replacement data)
-
reset_dfilter
(ftype=True)¶ Reset (clears) the data filter.
Parameters: ftype (CsvFiltered) – Filter type Raises: RuntimeError (Argument `ftype` is not valid)
-
rows
(filtered=False)¶ Return the number of data rows.
Parameters: filtered (boolean) – Flag that indicates whether the raw (input) data should be used (False) or whether filtered data should be used (True) Raises: RuntimeError (Argument `filtered` is not valid)
-
write
(fname=None, filtered=False, header=True, append=False)¶ Write (processed) data to a specified comma-separated values (CSV) file.
Parameters: - fname (FileName) – Name of the comma-separated values file to be written. If None the file from which the data originated is overwritten
- filtered (CsvFiltered) – Filtering type
- header (string, list of strings or boolean) – If a list, column headers to use in the file. If boolean, flag that indicates whether the input column headers should be written (True) or not (False)
- append (boolean) – Flag that indicates whether data is added to an existing file (or a new file is created if it does not exist) (True), or whether data overwrites the file contents (if the file exists) or creates a new file if the file does not exists (False)
Raises: - OSError (File [fname] could not be created: [reason])
- RuntimeError (Argument `append` is not valid)
- RuntimeError (Argument `filtered` is not valid)
- RuntimeError (Argument `fname` is not valid)
- RuntimeError (Argument `header` is not valid)
- RuntimeError (Argument `no_empty` is not valid)
- ValueError (There is no data to save to file)
-
cfilter
¶ Set or return the column filter.
Type: CsvColFilter or None. If None no column filtering is done Return type: CsvColFilter or None Raises: (when assigned)
- RuntimeError (Argument `cfilter` is not valid)
- RuntimeError (Invalid column specification)
- ValueError (Column [column_identifier] not found)
-
dfilter
¶ Set or return the data (row and/or column) filter.
The first tuple item is the row filter and the second tuple item is the column filter
Type: CsvDataFilter or None. If None no data filtering is done Return type: CsvDataFilter or None Raises: (when assigned)
- RuntimeError (Argument `dfilter` is not valid)
- RuntimeError (Invalid column specification)
- ValueError (Column [column_identifier] not found)
-
rfilter
¶ Set or return the row filter.
Type: CsvRowFilter or None. If None no row filtering is done Return type: CsvRowFilter or None Raises: (when assigned)
- RuntimeError (Argument `rfilter` is not valid)
- RuntimeError (Invalid column specification)
- ValueError (Argument `rfilter` is empty)
- ValueError (Column [column_identifier] not found)
Contracts pseudo-types¶
Introduction¶
The pseudo-types defined below can be used in contracts of the PyContracts or Pexdoc libraries. As an example, with the latter:
>>> from __future__ import print_function >>> import pexdoc >>> from pcsv.ptypes import csv_col_filter >>> @pexdoc.pcontracts.contract(cfilter='csv_col_filter') ... def myfunc(cfilter): ... print('CSV filter received: '+cfilter) ... >>> myfunc('m') CSV filter received: m >>> myfunc(35+3j) Traceback (most recent call last): ... RuntimeError: Argument `cfilter` is not valid
Alternatively each pseudo-type has a checker function associated with it that can be used to verify membership. For example:
>>> import pcsv.ptypes >>> # None is returned if object belongs to pseudo-type >>> pcsv.ptypes.csv_col_filter('m') >>> # ValueError is raised if object does not belong to pseudo-type >>> pcsv.ptypes.csv_col_filter(35+3j) Traceback (most recent call last): ... ValueError: [START CONTRACT MSG: csv_col_filter]...
Description¶
CsvColFilter¶
Import as csv_col_filter
. String, integer, a list of strings or a list
of integers that identify a column or columns within a comma-separated values
(CSV) file.
Integers identify a column by position (column 0 is the leftmost column) whereas strings identify the column by name. Columns can be identified either by position or by name when the file has a header (first row of file containing column labels) but only by position when the file does not have a header.
None
indicates that no column filtering should be done
CsvColSort¶
Import as csv_col_sort
. Integer, string, dictionary or list of
integers, strings or dictionaries that specify the sort direction of a
column or columns in a comma-separated values (CSV) file.
The sort direction can be either ascending, specified by the string
'A'
, or descending, specified by the string 'B'
(case
insensitive). The default sort direction is ascending.
The column can be specified numerically or with labels depending on whether the CSV file was loaded with or without a header.
The full specification is a dictionary (or list of dictionaries if multiple
columns are to be used for sorting) where the key is the column and the value
is the sort order, thus valid examples are {'MyCol':'A'}
and
[{'MyCol':'A'}, {3:'d'}]
.
When the default direction suffices it can be omitted; for example in
[{'MyCol':'D'}, 3]
, the data is sorted first by MyCol in descending
order and then by the 4th column (column 0 is the leftmost column in a CSV
file) in ascending order
CsvDataFilter¶
Import as csv_data_filter
. In its most general form a two-item tuple,
where one item is of CsvColFilter pseudo-type and the other item is of
CsvRowFilter pseudo-type (the order of the items is not mandated, i.e.
the first item could be of pseudo-type CsvRowFilter and the second item
could be of pseudo-type CsvColFilter or vice-versa).
The two-item tuple can be reduced to a one-item tuple when only a row or column filter needs to be specified, or simply to an object of either CsvRowFilter or CsvColFilter pseudo-type.
For example, all of the following are valid CsvDataFilter objects:
('MyCol', {'MyCol':2.5})
, ({'MyCol':2.5}, 'MyCol')
(filter in
the column labeled MyCol and rows where the column labeled MyCol has the value
2.5), ('MyCol', )
(filter in column labeled MyCol and all rows) and
{'MyCol':2.5}
(filter in all columns and only rows where the column
labeled MyCol has the values 2.5)
None
, (None, )
or (None, None)
indicate that no row or
column filtering should be done
CsvFiltered¶
Import as csv_filtered
. String or a boolean that indicates what type of
row and column filtering is to be performed in a comma-separated values (CSV)
file. If True
, 'B'
or 'b'
it indicates that both
row- and column-filtering are to be performed; if False
, 'N'
or 'n'
no filtering is to be performed, if 'R'
or 'r'
only row-filtering is to be performed, if 'C'
or 'c'
only
column-filtering is to be performed
CsvRowFilter¶
Import as csv_row_filter
. Dictionary whose elements are sub-filters
with the following structure:
- column identifier (CsvColFilter) – Dictionary key. Column to filter (as it appears in the comma-separated values file header when a string is given) or column number (when an integer is given, column zero is the leftmost column)
- value (list of strings or numbers, or string or number) – Dictionary value. Column value to filter if a string or number, column values to filter if a list of strings or numbers
If a row filter sub-filter is a column value all rows which contain the specified value in the specified column are kept for that particular individual filter. The overall data set is the intersection of all the data sets specified by each individual sub-filter. For example, if the file to be processed is:
Ctrl | Ref | Result |
---|---|---|
1 | 3 | 10 |
1 | 4 | 20 |
2 | 4 | 30 |
2 | 5 | 40 |
3 | 5 | 50 |
Then the filter specification rfilter = {'Ctrl':2, 'Ref':5}
would result
in the following filtered data set:
Ctrl | Ref | Result |
---|---|---|
2 | 5 | 40 |
However, the filter specification rfilter = {'Ctrl':2, 'Ref':3}
would
result in an empty list because the data set specified by the Ctrl
individual sub-filter does not overlap with the data set specified by the
Ref individual sub-filter.
If a row sub-filter is a list, the items of the list represent all
the values to be kept for a particular column (strings or numbers). So for
example rfilter = {'Ctrl':[2, 3], 'Ref':5}
would result in the following
filtered data set:
Ctrl | Ref | Result |
---|---|---|
2 | 5 | 40 |
3 | 5 | 50 |
None
indicates that no row filtering should be done
Checker functions¶
-
pcsv.ptypes.
csv_col_filter
(obj)¶ Validate if an object is a CsvColFilter pseudo-type object.
Parameters: obj (any) – Object
Raises: - RuntimeError (Argument `*[argument_name]*` is not valid). The token *[argument_name]* is replaced by the name of the argument the contract is attached to
Return type: None
-
pcsv.ptypes.
csv_col_sort
(obj)¶ Validate if an object is a CsvColSort pseudo-type object.
Parameters: obj (any) – Object
Raises: - RuntimeError (Argument `*[argument_name]*` is not valid). The token *[argument_name]* is replaced by the name of the argument the contract is attached to
Return type: None
-
pcsv.ptypes.
csv_data_filter
(obj)¶ Validate if an object is a CsvDataFilter pseudo-type object.
Parameters: obj (any) – Object
Raises: - RuntimeError (Argument `*[argument_name]*` is not valid). The token *[argument_name]* is replaced by the name of the argument the contract is attached to
Return type: None
-
pcsv.ptypes.
csv_filtered
(obj)¶ Validate if an object is a CsvFilter pseudo-type object.
Parameters: obj (any) – Object
Raises: - RuntimeError (Argument `*[argument_name]*` is not valid). The token *[argument_name]* is replaced by the name of the argument the contract is attached to
Return type: None
-
pcsv.ptypes.
csv_row_filter
(obj)¶ Validate if an object is a CsvRowFilter pseudo-type object.
Parameters: obj (any) – Object
Raises: - RuntimeError (Argument `*[argument_name]*` is not valid). The token *[argument_name]* is replaced by the name of the argument the contract is attached to
- ValueError (Argument `*[argument_name]*` is empty). The token *[argument_name]* is replaced by the name of the argument the contract is attached to
Return type: None