Package: ff 4.5.0
ff: Memory-Efficient Storage of Large Data on Disk and Fast Access Functions
The ff package provides data structures that are stored on disk but behave (almost) as if they were in RAM by transparently mapping only a section (pagesize) in main memory - the effective virtual memory consumption per ff object. ff supports R's standard atomic data types 'double', 'logical', 'raw' and 'integer' and non-standard atomic types boolean (1 bit), quad (2 bit unsigned), nibble (4 bit unsigned), byte (1 byte signed with NAs), ubyte (1 byte unsigned), short (2 byte signed with NAs), ushort (2 byte unsigned), single (4 byte float with NAs). For example 'quad' allows efficient storage of genomic data as an 'A','T','G','C' factor. The unsigned types support 'circular' arithmetic. There is also support for close-to-atomic types 'factor', 'ordered', 'POSIXct', 'Date' and custom close-to-atomic types. ff not only has native C-support for vectors, matrices and arrays with flexible dimorder (major column-order, major row-order and generalizations for arrays). There is also a ffdf class not unlike data.frames and import/export filters for csv files. ff objects store raw data in binary flat files in native encoding, and complement this with metadata stored in R as physical and virtual attributes. ff objects have well-defined hybrid copying semantics, which gives rise to certain performance improvements through virtualization. ff objects can be stored and reopened across R sessions. ff files can be shared by multiple ff R objects (using different data en/de-coding schemes) in the same process or from multiple R processes to exploit parallelism. A wide choice of finalizer options allows to work with 'permanent' files as well as creating/removing 'temporary' ff files completely transparent to the user. On certain OS/Filesystem combinations, creating the ff files works without notable delay thanks to using sparse file allocation. Several access optimization techniques such as Hybrid Index Preprocessing and Virtualization are implemented to achieve good performance even with large datasets, for example virtual matrix transpose without touching a single byte on disk. Further, to reduce disk I/O, 'logicals' and non-standard data types get stored native and compact on binary flat files i.e. logicals take up exactly 2 bits to represent TRUE, FALSE and NA. Beyond basic access functions, the ff package also provides compatibility functions that facilitate writing code for ff and ram objects and support for batch processing on ff objects (e.g. as.ram, as.ff, ffapply). ff interfaces closely with functionality from package 'bit': chunked looping, fast bit operations and coercions between different objects that can store subscript information ('bit', 'bitwhich', ff 'boolean', ri range index, hi hybrid index). This allows to work interactively with selections of large datasets and quickly modify selection criteria. Further high-performance enhancements can be made available upon request.
Authors:
ff_4.5.0.tar.gz
ff_4.5.0.zip(r-4.5)ff_4.5.0.zip(r-4.4)ff_4.5.0.zip(r-4.3)
ff_4.5.0.tgz(r-4.4-x86_64)ff_4.5.0.tgz(r-4.4-arm64)ff_4.5.0.tgz(r-4.3-x86_64)ff_4.5.0.tgz(r-4.3-arm64)
ff_4.5.0.tar.gz(r-4.5-noble)ff_4.5.0.tar.gz(r-4.4-noble)
ff.pdf |ff.html✨
ff/json (API)
NEWS
# Install 'ff' in R: |
install.packages('ff', repos = c('https://truecluster.r-universe.dev', 'https://cloud.r-project.org')) |
Bug tracker:https://github.com/truecluster/ff/issues
Last updated 2 months agofrom:1eb1a99ed8. Checks:OK: 9. Indexed: yes.
Target | Result | Date |
---|---|---|
Doc / Vignettes | OK | Nov 17 2024 |
R-4.5-win-x86_64 | OK | Nov 17 2024 |
R-4.5-linux-x86_64 | OK | Nov 17 2024 |
R-4.4-win-x86_64 | OK | Nov 17 2024 |
R-4.4-mac-x86_64 | OK | Nov 17 2024 |
R-4.4-mac-aarch64 | OK | Nov 17 2024 |
R-4.3-win-x86_64 | OK | Nov 17 2024 |
R-4.3-mac-x86_64 | OK | Nov 17 2024 |
R-4.3-mac-aarch64 | OK | Nov 17 2024 |
Exports:.ffbytes.ffmode.rambytes.rammode.vcoerceable.vimplemented.vmax.vmin.vmode.vNA.vunsigned.vvaluesaddappendLevelsarray2vectorarrayIndex2vectorIndexas.booleanas.byteas.ffas.ffdfas.hias.nibbleas.quadas.ramas.shortas.ubyteas.ushortas.vmodebigsamplebooleanbyteccbindcfunclengthclone.ffclone.ffdfcmeancmediancquantilecrbindcsumcsummarydeletedeleteIfOpendforderdfsortdimorderdimorder<-dimorderCompatibledimorderStandarddummy.dimnamesffffapplyffcolapplyffconformffdfffdfindexgetffdfindexsetffdforderffdfsortffdropffindexgetffindexorderffindexordersizeffindexsetffinfoffloadfforderffreturnffrowapplyffsaveffsave.imageffsortffsuitableffsymmxtensionsfftempfileffvecapplyffxtensionsfile.movefile.resizefilenamefilename<-finalizefinalizerfinalizer<-fixdiagfixdiag<-get.ffgetalignedpagesizegetdefaultpagesizegeterror.ffgeterrstr.ffgetpagesizegetset.ffhihiparseis.factoris.ffis.ffdfis.openis.orderedis.readonlymatcombmatprintmaxffmodemaxlengthmismatchncol<-nibblenrow<-pagesizepatternpattern<-quadram2ffcoderam2ramcoderamattribsramclassramdforderramdfsortread.csv.ffdfread.csv2.ffdfread.delim.ffdfread.delim2.ffdfread.ffread.table.ffdfreadwrite.ffrecodeLevelsregtest.fforderregtest.vmoderepnamset.ffshortsortLevelssplitPathFilestandardPathFilesubscript2integerswapsymmetricsymmIndex2vectorIndextempPathFileubyteunclass<-undimunsortunsplitPathFileushortvecprintvector.vmodevector2arrayvectorCompatiblevectorIndex2arrayIndexvectorStandardvmodevmode<-vtvwvw<-write.csvwrite.csv.ffdfwrite.csv2write.csv2.ffdfwrite.ffwrite.table.ffdfymismatch
Dependencies:bit
Readme and manuals
Help Manual
Help page | Topics |
---|---|
Incrementing an ff or ram object | add add.default add.ff |
Array: make vector from array | array2vector |
Array: make vector positions from array index | arrayIndex2vectorIndex |
Coercing ram to ff and ff to ram objects | as.ff as.ff.default as.ff.ff as.ram as.ram.default as.ram.ff |
Conversion between bit and ff boolean | as.bit.ff as.ff.bit |
Coercing to ffdf and data.frame | as.data.frame.ffdf as.ffdf as.ffdf.data.frame as.ffdf.ff_matrix as.ffdf.ff_vector |
Hybrid Index, coercion to | as.hi as.hi.( as.hi.bit as.hi.bitwhich as.hi.call as.hi.character as.hi.double as.hi.hi as.hi.integer as.hi.logical as.hi.matrix as.hi.name as.hi.NULL as.hi.ri as.hi.which |
Hybrid Index, coercing from | as.bit.hi as.bitwhich.hi as.character.hi as.integer.hi as.logical.hi as.matrix.hi as.which.hi |
Coercing to virtual mode | as.boolean as.boolean.default as.byte as.byte.default as.nibble as.nibble.default as.quad as.quad.default as.short as.short.default as.ubyte as.ubyte.default as.ushort as.ushort.default as.vmode as.vmode.default as.vmode.ff |
Sampling from large pools | bigsample bigsample.default bigsample.ff |
Collapsing functions for batch processing | ccbind CFUN cfun clength cmean cmedian cquantile crbind csum csummary |
Chunk ff_vector and ffdf | chunk.ffdf chunk.ff_vector |
Cloning ff and ram objects | clone.ff |
Cloning ffdf objects | clone.ffdf |
Closing ff files | close.ff close.ffdf close.ff_pointer |
Deleting the file behind an ff object | delete delete.default delete.ff delete.ffdf delete.ff_pointer deleteIfOpen deleteIfOpen.ff deleteIfOpen.ff_pointer |
Getting and setting dim and dimorder | dim.ff dim.ffdf dim<-.ff dim<-.ffdf dimorder dimorder.default dimorder.ffdf dimorder.ff_array dimorder<- dimorder<-.ffdf dimorder<-.ff_array |
Getting and setting dimnames | dimnames.ff dimnames.ff_array dimnames<-.ff_array |
Getting and setting dimnames of ffdf | dimnames.ffdf dimnames<-.ffdf names.ffdf names<-.ffdf row.names.ffdf row.names<-.ffdf |
Test for dimorder compatibility | dimorderCompatible dimorderStandard vectorCompatible vectorStandard |
Array: make dimnames | dummy.dimnames |
Reading and writing vectors and arrays (high-level) | Extract.ff [.ff [.ff_array [<-.ff [<-.ff_array [[.ff [[<-.ff |
Reading and writing data.frames (ffdf) | $.ffdf $<-.ffdf Extract.ffdf [.ffdf [<-.ffdf [[.ffdf [[<-.ffdf |
ff classes for representing (large) atomic data | ff ff_pointer |
Apply for ff objects | ffapply ffcolapply ffrowapply ffvecapply |
Get most conforming argument | ffconform |
ff class for data.frames | ffdf |
Reading and writing ffdf data.frame using ff subscripts | ffdfindexget ffdfindexset |
Sorting: convenience wrappers for data.frames | dforder dfsort ffdforder ffdfsort ramdforder ramdfsort |
Delete an ffarchive | ffdrop |
Reading and writing ff vectors using ff subscripts | ffindexget ffindexset |
Sorting: chunked ordering of integer suscript positions | ffindexorder ffindexordersize |
Inspect content of ff saves | ffinfo |
Reload ffSaved Datasets | ffload |
Sorting: order from ff vectors | fforder |
Return suitable ff object | ffreturn |
Save R and ff objects | ffsave ffsave.image |
Sorting of ff vectors | ffsort |
Test ff object for suitability | ffsuitable ffsuitable_attribs |
Test for availability of ff extensions | ffsymmxtensions ffxtensions |
Change size of move an existing file | file.move file.resize |
Get or set filename | filename filename.default filename.ffdf filename.ff_pointer filename<- filename<-.ff pattern pattern.ff pattern<- pattern<-.ff pattern<-.ffdf |
Call finalizer | finalize finalize.ff finalize.ffdf finalize.ff_pointer |
Get and set finalizer (name) | finalizer finalizer.ff finalizer<- finalizer<-.ff |
Test for fixed diagonal | fixdiag fixdiag.default fixdiag.dist fixdiag.ff fixdiag<- |
Get error and error string | geterror.ff geterrstr.ff |
Get page size information | getalignedpagesize getdefaultpagesize getpagesize |
Reading and writing vectors of values (low-level) | get.ff getset.ff set.ff |
Hybrid index class | hi print.hi str.hi |
Hybrid Index, parsing | hiparse |
Test for class ff | is.ff |
Test for class ff | is.ffdf |
Test if object is opened | is.open is.open.ff is.open.ffdf is.open.ff_pointer |
Get readonly status | is.readonly is.readonly.ff |
Getting and setting 'is.sorted' physical attribute | is.sorted.default is.sorted<-.default |
Getting and setting length | length.ff length<-.ff |
Getting length of a ffdf dataframe | length.ffdf |
Hybrid Index, querying | length.hi maxindex.hi poslength.hi |
Getting and setting factor levels | is.factor is.factor.default is.factor.ff is.ordered is.ordered.default is.ordered.ff levels.ff levels<-.ff |
ff Limitations and Warnings | LimWarn |
Array: make matrix indices from row and columns positions | matcomb |
Print beginning and end of big matrix | matprint print.matprint |
Lossless vmode coercability | maxffmode |
Get physical length of an ff or ram object | maxlength maxlength.default maxlength.ff |
Test for recycle mismatch | mismatch ymismatch |
Getting and setting 'na.count' physical attribute | na.count.default na.count.ff na.count<-.default na.count<-.ff |
Getting and setting names | names.ff names.ff_array names<-.ff names<-.ff_array |
Assigning the number of rows or columns | ncol<- nrow<- |
Opening an ff file | open.ff open.ffdf |
Pagesize of ff object | pagesize pagesize.ff |
Getting and setting physical and virtual attributes of ff objects | physical.ff physical<-.ff virtual.ff virtual<-.ff |
Getting physical and virtual attributes of ffdf objects | physical.ffdf virtual.ffdf |
Print and str methods | print.ff print.ffdf print.ff_matrix print.ff_vector str.ff str.ffdf |
Factor codings | ram2ffcode ram2ramcode |
Get ramclass and ramattribs | ramattribs ramattribs.default ramattribs.ff ramattribs_excludes ramclass ramclass.default ramclass.ff ramclass_excludes |
Sorting: order R vector in-RAM and in-place | keyorder.default mergeorder.default radixorder.default ramorder.default shellorder.default |
Sorting: Sort R vector in-RAM and in-place | keysort.default mergesort.default radixsort.default ramsort.default shellsort.default |
Importing csv files into ff data.frames | read.csv.ffdf read.csv2.ffdf read.delim.ffdf read.delim2.ffdf read.table.ffdf |
Reading and writing vectors (low-level) | read.ff readwrite.ff write.ff |
Sorting: regression tests | regtest.fforder |
Replicate with names | repnam |
Factor level manipulation | appendLevels recodeLevels recodeLevels.factor recodeLevels.ff sortLevels sortLevels.factor sortLevels.ff sortLevels.ffdf |
Analyze pathfile-strings | fftempfile splitPathFile standardPathFile tempPathFile unsplitPathFile |
Reading and writing in one operation (high-level) | swap swap.default swap.ff swap.ff_array |
Test for symmetric structure | symmetric symmetric.default symmetric.dist symmetric.ff |
Array: make vector positions from symmetric array index | symmIndex2vectorIndex |
Unclassed assignement | unclass<- |
Undim | undim |
Hybrid Index, internal utilities | subscript2integer unsort unsort.ahi unsort.hi |
Update ff content from another object | update.ff update.ffdf |
Print beginning and end of big vector | print.vecprint vecprint |
Create vector of virtual mode | boolean byte nibble quad short ubyte ushort vector.vmode vector.vmode.default vector.vmode.ff |
Array: make array from vector | vector2array |
Array: make array from index vector positions | vectorIndex2arrayIndex |
Virtual storage mode | .ffbytes .ffmode .rambytes .rammode .vcoerceable .vimplemented .vmax .vmin .vmode .vNA .vunsigned .vvalues regtest.vmode vmode vmode.default vmode.ff vmode<- vmode<-.default vmode<-.ff |
Virtual storage mode of ffdf | vmode.ffdf |
Virtual transpose | t.ff vt vt.default vt.ff |
Getting and setting virtual windows | vw vw.default vw.ff vw<- vw<-.ff_array vw<-.ff_vector |
Exporting csv files from ff data.frames | write.csv write.csv.ffdf write.csv2 write.csv2.ffdf write.table.ffdf |