WO2022239281A1 - 画像処理装置、画像処理方法、及びプログラム - Google Patents
画像処理装置、画像処理方法、及びプログラム Download PDFInfo
- Publication number
- WO2022239281A1 WO2022239281A1 PCT/JP2021/044138 JP2021044138W WO2022239281A1 WO 2022239281 A1 WO2022239281 A1 WO 2022239281A1 JP 2021044138 W JP2021044138 W JP 2021044138W WO 2022239281 A1 WO2022239281 A1 WO 2022239281A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- moving image
- captured
- camera
- metadata
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/02—Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
- G11B27/031—Electronic editing of digitised analogue information signals, e.g. audio or video signals
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/10—Indexing; Addressing; Timing or synchronising; Measuring tape travel
- G11B27/34—Indicating arrangements
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/222—Studio circuitry; Studio devices; Studio equipment
- H04N5/262—Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
Definitions
- the present disclosure relates to an image processing device, an image processing method, and a program, and more particularly to an image processing device, an image processing method, and a program that can provide a highly satisfying video production service for users.
- Patent Literature 1 discloses a program for automatically editing a moving image that designates a template.
- the present disclosure has been made in view of this situation, and is intended to provide a video production service that satisfies users.
- An image processing device acquires a captured image to which metadata is added, and based on the temporal length of a moving image to be produced set on a setting screen and the metadata, the acquired
- the image processing apparatus includes a processing unit that selects captured images to be used for creating the moving image from captured images and creates the moving image using the selected captured images.
- an image processing apparatus acquires a captured image to which metadata is added, and based on the temporal length of a moving image to be produced set on a setting screen and the metadata. Then, from among the acquired captured images, a captured image to be used for creating the moving image is selected, and the selected captured image is used to create the moving image.
- a program causes a computer to acquire captured images to which metadata is added, based on the time length of a moving image to be produced set on a setting screen, and the metadata.
- an image processing device an image processing method, and a program according to one aspect of the present disclosure
- a photographed image to which metadata is added is acquired, and the temporal length of the moving image to be produced set on the setting screen and the metadata
- captured images to be used for producing the moving image are selected from the acquired captured images, and the moving image is produced using the selected captured images.
- the image processing device may be an independent device, or may be an internal block forming one device.
- FIG. 1 is a diagram illustrating a configuration example of an embodiment of a video production system to which the present disclosure is applied;
- FIG. It is a block diagram which shows the structural example of a camera.
- It is a block diagram which shows the structural example of a cloud server.
- It is a block diagram which shows the structural example of a terminal device.
- It is a figure which shows the upload method of the picked-up image from a camera to a cloud server.
- FIG. 10 is a diagram showing a first example of a sequence of uploading captured image files;
- FIG. 10 is a diagram showing a first example of a sequence of uploading captured image files;
- FIG. 10 is a diagram showing a second example of a sequence of uploading a photographed image file; 4 is a flowchart for explaining the overall flow of a video production service; 10 is a flowchart for explaining details of editing processing; FIG. 10 is a diagram showing an example of presentation of shot marks; It is a figure which shows the example of a presentation of motion information of a camera.
- 3 is a block diagram showing a functional configuration example of a processing unit in the moving image production system 1; FIG. It is a figure which shows the 1st example of a setting screen. It is a figure which shows the 1st example of an edit screen. It is a figure which shows the 2nd example of a setting screen.
- FIG. 4 is a diagram showing an example of aspect ratios; It is a figure which shows the example of a standard time.
- FIG. 10 is a diagram showing an example of a template;
- FIG. 10 is a diagram showing a second example of an edit screen;
- FIG. 10 is a diagram showing a first example of a file management screen;
- FIG. 10 is a diagram showing a second example of a file management screen;
- It is a figure which shows the example of a project registration screen.
- It is a figure which shows the example of a display of a 7th area
- FIG. 13 is a diagram showing a third example of a setting screen;
- 5 is a flowchart for explaining the flow of shot image selection processing and automatic editing processing;
- FIG. 10 is a diagram showing an example of selection of captured images for each group;
- FIG. 4 is a diagram showing an example of transition periods;
- FIG. 1 is a diagram showing a configuration example of an embodiment of a video production system to which the present disclosure is applied.
- the movie production system 1 in FIG. 1 is a system for producing movies from images taken by a user.
- a video production system 1 is composed of a camera 10 , a cloud server 20 and a terminal device 30 .
- the camera 10 is a digital camera capable of shooting moving images and still images.
- the camera 10 is not limited to a digital camera, and may be a device having a photographing function such as a smart phone or a tablet terminal.
- the camera 10 shoots an image of a subject according to a user's operation, and records the resulting shot image.
- Captured images include content such as moving images and still images.
- content such as moving images and still images.
- moving images when it is necessary to distinguish between moving images as captured images and moving images automatically produced by a moving image production service, the latter will be referred to as produced moving images.
- the captured image captured by the camera 10 is transmitted to the cloud server 20.
- the camera 10 can transmit the captured image to the cloud server 20 via the network 40-1.
- the terminal device 30 may transmit the captured image to the cloud server 20 via the network 40-2.
- the networks 40-1 and 40-2 include communication lines such as the Internet and mobile phone networks.
- the networks 40-1 and 40-2 may be the same network or different networks.
- the network 40-1 and the network 40-2 will be referred to as the network 40 when there is no need to distinguish between them.
- the cloud server 20 is a server that provides a video production service that automatically produces production videos from captured images through the network 40 .
- the cloud server 20 is an example of an image processing device to which the present disclosure is applied.
- the cloud server 20 receives captured images captured by the camera 10 via the network 40 .
- the cloud server 20 produces a produced moving image by performing processing such as editing on the captured image, and transmits the produced moving image to the terminal device 30 via the network 40 .
- the cloud server 20 also generates screens (for example, web pages) such as setting screens and edit screens, and transmits them to the terminal device 30 via the network 40 .
- the terminal device 30 is a device such as a PC (Personal Computer), a tablet terminal, or a smartphone.
- the terminal device 30 displays screens such as setting screens and editing screens from the cloud server 20 (for example, UI (User Interface) of a web browser), and according to user operations on those screens, settings related to the video production service and production videos processing such as editing.
- the terminal device 30 receives the production video transmitted from the cloud server 20 via the network 40 .
- the terminal device 30 records the produced moving image in the terminal and outputs it to the outside.
- FIG. 2 is a block diagram showing a configuration example of the camera 10 of FIG.
- the camera 10 includes a lens system 111, an imaging unit 112, a camera signal processing unit 113, a recording control unit 114, a display unit 115, a communication unit 116, an operation unit 117, a camera control unit 118, and a memory unit 119. , a driver unit 120 , a sensor unit 121 , a sound input unit 122 and a sound processing unit 123 .
- the lens system 111 takes in incident light (image light) from a subject and causes it to enter the imaging unit 112 .
- the imaging unit 112 has a solid-state imaging device such as a CMOS (Complementary Metal Oxide Semiconductor) image sensor, and converts the amount of incident light imaged on the imaging surface of the solid-state imaging device by the lens system 111 into an electric signal in pixel units. It is converted and output as a pixel signal.
- CMOS Complementary Metal Oxide Semiconductor
- the camera signal processing unit 113 is composed of a DSP (Digital Signal Processor), a frame memory for temporarily recording image data, and the like.
- the camera signal processing unit 113 performs various kinds of signal processing on the image signal output from the imaging unit 112, and outputs image data of the captured image obtained as a result. In this manner, the lens system 111, the imaging section 112, and the camera signal processing section 113 constitute an imaging system.
- the recording control unit 114 records the image data of the captured image captured by the imaging system in a storage medium including a memory card such as a flash memory.
- a display unit 115 is composed of a liquid crystal display, an organic EL display, or the like, and displays a captured image captured by the imaging system.
- the communication unit 116 is composed of a communication module or the like compatible with a predetermined communication method such as wireless communication including wireless LAN and cellular communication (for example, 5G (5th Generation)), and captures an image captured by the imaging system. Send data to other devices over a network.
- the operation unit 117 includes an operation system such as physical buttons and a touch panel, and issues operation commands for various functions of the camera 10 according to user's operations.
- the camera control unit 118 is composed of processors such as a CPU (Central Processing Unit) and a microprocessor, and controls the operation of each unit of the camera 10 .
- the memory unit 119 records various data under the control of the camera control unit 118 .
- the driver unit 120 drives the lens system 111 to achieve autofocus, zooming, etc., under the control of the camera control unit 118 .
- the sensor unit 121 senses spatial information, time information, etc., and outputs a sensor signal obtained as a result of the sensing.
- the sensor unit 121 includes various sensors such as a gyro sensor and an acceleration sensor.
- the sound input unit 122 is composed of a microphone or the like, detects sounds such as the user's voice (speech) or environmental sounds, and outputs sound signals obtained as a result.
- the sound processing unit 123 performs sound signal processing on the sound signal output from the sound input unit 122 .
- the sound signal from the sound processing unit 123 is input to the camera signal processing unit 113, processed in synchronization with the image signal under the control of the camera control unit 118, and recorded as the sound (audio) of the moving image.
- various metadata can be added to captured images including moving images and still images.
- image plane phase difference pixels when image plane phase difference pixels are arranged in the pixel region of the solid-state image sensor, information obtained from the image plane phase difference pixels is given as metadata (image plane phase difference pixel information meta). be able to.
- Information about autofocus by the camera control unit 118 and the driver unit 120 may be given as metadata (focus meta).
- the sensor unit 121 can add information obtained from a sensor such as a gyro sensor as metadata (gyrometa, etc.).
- a sensor such as a gyro sensor
- metadata gyrometa, etc.
- information and the like regarding a device such as a built-in camera microphone
- a shot mark may be added to a captured image including a captured moving image or still image in accordance with the operation of the operation unit 117 by the user.
- the operation unit 117 including the operation system such as buttons and touch panel UI etc.
- a shot mark is added to the target captured image.
- a shot mark is a "mark" given by a user at a desired timing, and can also be said to be metadata given to a captured image.
- Image plane phase difference pixels, autofocus, sensors, information on sound input devices, and shot marks are examples of metadata given by the camera 10.
- Information processed inside the camera 10 may be other information.
- information may be given as metadata.
- FIG. 3 is a block diagram showing a configuration example of the cloud server 20 of FIG.
- a CPU 211 a CPU (Read Only Memory) 212 and a RAM (Random Access Memory) 213 are interconnected by a bus 214 .
- An input/output I/F 215 is further connected to the bus 214 .
- An input unit 216 , an output unit 217 , a storage unit 218 and a communication unit 219 are connected to the input/output I/F 215 .
- the input unit 216 supplies various input signals to each unit including the CPU 211 via the input/output I/F 215 .
- the input unit 216 is composed of a keyboard, mouse, microphone, and the like.
- the output unit 217 outputs various information according to control from the CPU 211 via the input/output I/F 215 .
- the output unit 217 is composed of a display, a speaker, and the like.
- the storage unit 218 is configured as an auxiliary storage device such as a semiconductor memory or HDD (Hard Disk Drive).
- the storage unit 218 records various data and programs under the control of the CPU 211 .
- the CPU 211 reads and processes various data from the storage unit 218 and executes programs.
- the communication unit 219 is composed of a communication module that supports wireless communication such as wireless LAN or cellular communication (eg 5G), or wired communication.
- the communication unit 219 communicates with other devices including the camera 10 and the terminal device 30 via the network 40 under the control of the CPU 211 .
- the configuration of the cloud server 20 shown in FIG. 3 is an example, and image processing may be performed by providing a dedicated processor such as a GPU (Graphics Processing Unit).
- a dedicated processor such as a GPU (Graphics Processing Unit).
- FIG. 4 is a block diagram showing a configuration example of the terminal device 30 of FIG.
- the CPU 311 , ROM 312 and RAM 313 are interconnected by a bus 314 .
- An input/output I/F 315 is further connected to the bus 314 .
- An input unit 316 , an output unit 317 , a storage unit 318 and a communication unit 319 are connected to the input/output I/F 315 .
- the input unit 316 supplies various input signals to each unit including the CPU 311 via the input/output I/F 315 .
- the input section 316 has an operation section 321 .
- the operation unit 321 includes a keyboard, mouse, microphone, physical buttons, touch panel, and the like. The operation unit 321 is operated by a user and supplies an operation signal corresponding to the operation to the CPU 311 .
- the output unit 317 outputs various information according to control from the CPU 311 via the input/output I/F 315 .
- the output section 317 has a display section 331 and a sound output section 332 .
- the display unit 331 is composed of a liquid crystal display, an organic EL display, and the like.
- a display unit 331 displays a captured image, an editing screen, and the like under the control of the CPU 311 .
- the sound output unit 332 is composed of a speaker, a headphone connected to an output terminal, or the like. The sound output unit 332 outputs a sound corresponding to the sound signal under the control of the CPU 311 .
- the storage unit 318 is configured as an auxiliary storage device such as a semiconductor memory.
- the storage unit 318 may be configured as an internal storage, or may be an external storage such as a memory card.
- a storage unit 318 records various data and programs under the control of the CPU 311 .
- the CPU 311 reads and processes various data from the storage unit 318 and executes programs.
- the communication unit 319 is composed of a communication module compatible with a predetermined communication method such as wireless communication such as wireless LAN or cellular communication (eg 5G), or wired communication.
- a communication unit 319 communicates with other devices via a network under the control of the CPU 311 .
- the configuration of the terminal device 30 shown in FIG. 4 is an example, and image processing may be performed by providing a dedicated processor such as a GPU.
- captured images captured by the camera 10 are extracted to the cloud server 20, and processing such as editing is performed using the captured images and attached metadata.
- a production video is produced.
- the terminal device 30 displays information about the captured images and produced moving images on the cloud server 20 on a screen such as an edit screen, so that the user can edit the information.
- the cloud server 20 is installed in a data center or the like, but is not limited to one server, and may be composed of a plurality of servers to provide the moving image production service.
- files of captured images captured by the camera 10 are uploaded to the cloud server 20 via the network 40 and processed, for example, by the method shown in FIG.
- FIG. 5 is a diagram showing a method of linking the camera 10 and the cloud server 20 and uploading the captured image from the camera 10 to the cloud server 20.
- 5A to 5F show exchanges between the camera 10 and the cloud server 20 in chronological order. Processing between the camera 10 and the cloud server 20 is performed in three stages: camera registration, camera connection, and file upload.
- the processing shown in A of FIG. 5 is performed. That is, the camera 10 connects to the cloud server 20 via the network 40 to use the moving image production service, and performs device registration according to a user operation or the like (A in FIG. 5).
- the processes shown in B to E of FIG. 5 are performed. That is, the camera 10 performs main body settings according to the user's operation or the like, turns on cloud cooperation, and completes device registration (B in FIG. 5). Also, the camera 10 notifies the cloud server 20 of power-on using a communication protocol such as MQTT (Message Queuing Telemetry Transport) (C in FIG. 5).
- MQTT Message Queuing Telemetry Transport
- the cloud server 20 that has received the notification from the camera 10 uses a communication protocol such as MQTT to notify the camera 10 of a command and a connection destination for shifting to WebRTC (Web Real-Time Communication) communication ( D) in FIG.
- WebRTC Web Real-Time Communication
- D Web Real-Time Communication
- WebRTC communication is performed between the camera 10 and the cloud server 20, and the file upload destination using an image transfer protocol such as PTP-IP (Picture Transfer Protocol over TCP/IP networks) is notified ( E) in FIG.
- PTP-IP Picture Transfer Protocol over TCP/IP networks
- the process shown in F in Fig. 5 is performed. That is, the camera 10 uses a request from the cloud server 20 as a trigger (as a PULL request) to start uploading files of captured images including moving images and still images via the network 40 (F in FIG. 5). At this time, a file uploaded from the camera 10 side, such as a main image or a proxy image, can be selected on the cloud server 20 side.
- the camera 10 can embed metadata in captured images to be uploaded.
- the cloud server 20 performs automatic selection, automatic trimming, automatic quality correction, etc. by processing using the metadata embedded in the captured images, and produces production moving images (automatic moving image production).
- HTTPS Hypertext Transfer Protocol Secure
- the proxy image is an image with a lower resolution than the main image.
- the camera 10 can simultaneously record a main image, which is a high-resolution captured image, and a proxy image, which is a low-resolution captured image. Thereby, the camera 10 can upload the proxy image and the main image at different timings. That is, the captured image includes the proxy image as well as the main image. For example, a main image and a proxy image are recorded for each of a moving image and a still image.
- Fig. 6 is a diagram showing how to upload the proxy image and the main image.
- the cloud server 20 makes a pull request for the proxy image, and the file of the proxy image is uploaded from the camera 10 .
- the cloud server 20 uses the uploaded proxy image file to determine captured images to be used for automatic video production.
- the cloud server 20 makes a PULL request for the main image corresponding to the determined captured image, and the main image file is uploaded from the camera 10.
- the cloud server 20 uses the uploaded main image file to automatically produce a moving image.
- the cloud server 20 requests the camera 10 to upload a proxy image, first extracts only the proxy image, and after determining the photographed images to be used for automatic video production using the proxy image, uploads the main image. It is possible to request the camera 10 and later pull out the main image used for automatic moving image production.
- FIG. 7 is a diagram showing a first example of a sequence for uploading a photographed image file.
- the terminal device 30 requests the cloud server 20 via the network 40 for an in-camera captured image list, which is a list of captured images recorded in the camera 10 (S11).
- the cloud server 20 requests the in-camera captured image list from the camera 10 via the network 40 (S12).
- the camera 10 receives the request from the cloud server 20 via the network 40, and transmits (returns) the photographed image list corresponding to the request (S13).
- the cloud server 20 transmits (returns) the captured image list from the camera 10 to the terminal device 30 via the network 40 (S14).
- a photographed image to be used in automatic video production by the cloud server 20 is selected from the photographed image list from the cloud server 20.
- the terminal device 30 can present a photographed image list and select a desired photographed image according to the user's operation.
- the terminal device 30 transmits a proxy image request list of captured images used on the cloud side to the cloud server 20 via the network 40 (S15).
- the cloud server 20 transmits the proxy image request list from the terminal device 30 to the camera 10 via the network 40 (S16).
- the camera 10 receives the proxy image request list from the cloud server 20 via the network 40, and uploads proxy images according to the list to the cloud server 20 (S17).
- Various metadata (camera metadata) are added to the proxy image.
- proxy image files uploaded from the camera 10 are sequentially recorded in the storage unit 218.
- the cloud server 20 transmits the proxy image uploaded by the camera 10 to the terminal device 30 via the network 40 (S18).
- the terminal device 30 analyzes the metadata attached to the proxy image from the cloud server 20, and selects the captured image for which uploading of the main image is requested (S19). At this time, the terminal device 30 can present information about the proxy image and metadata, and can select a desired captured image according to the user's operation. The terminal device 30 transmits the main image request list to the cloud server 20 via the network 40 (S20).
- the cloud server 20 transmits the main image request list from the terminal device 30 to the camera 10 via the network 40 (S21).
- the camera 10 receives the main image request list from the cloud server 20 via the network 40, and uploads the main image corresponding to the list to the cloud server 20 (S22).
- the photographed image uploaded as the main image is a moving image, it may be the entire length or a part of one moving image. In other words, it is possible to clip and upload the entire length or part of one moving image as the main image.
- the main image files uploaded from the camera 10 are sequentially recorded in the storage unit 218.
- the cloud server 20 transmits the main image uploaded by the camera 10 to the terminal device 30 via the network 40 (S23).
- moving image production processing such as editing processing using the main image is performed in cooperation with the cloud server 20 as necessary (S24).
- the camera 10 uploads the metadata of the shot image being shot to the cloud server 20, so that the cloud server 20 uploads the proxy image to the camera 10 based on the metadata before the shooting ends. may be requested.
- the camera 10 uploads the metadata and the proxy image of the image being shot to the cloud server 20, so that the cloud server 20 can upload the main image based on the metadata and the proxy image before the end of shooting. may be requested to the camera 10 for uploading.
- the terminal device 30 may upload the captured image file to the cloud server 20 by transferring the captured image file recorded in the camera 10 to the terminal device 30 .
- FIG. 8 is a diagram showing a second example of a sequence for uploading a photographed image file.
- the camera 10 transfers the captured image to the terminal device 30 (S31).
- the terminal device 30 records the transferred captured image file in the storage unit 318 .
- the captured image file can be transferred using a memory card such as a flash memory, wireless communication such as wireless LAN, or wired communication conforming to standards such as USB.
- the terminal device 30 accesses the web page provided by the cloud server 20 via the network 40 according to location information such as a URL (Uniform Resource Locator) (S32).
- the cloud server 20 transmits the file management screen via the network 40 in response to the access from the terminal device 30 (S33).
- the file management screen from the cloud server 20 is presented, and the file of the captured image to be uploaded is specified from the captured images in the terminal recorded in the storage unit 318 according to the user's operation (S34). ).
- the terminal device 30 uploads the specified captured image to the cloud server 20 via the network 40 (S35).
- the captured image files uploaded from the terminal device 30 are sequentially recorded in the storage unit 218, and when the upload of the captured image is completed, the terminal device 30 is notified of the completion of the upload via the network 40 ( S36).
- moving image production processing such as editing processing using the captured image is performed (S37).
- the main image and the proxy image are not distinguished as captured images.
- FIG. 9 is a flow chart showing the flow of the moving picture production service provided by the moving picture production system 1. As shown in FIG. 9
- the camera 10 takes a picture (S111), and the captured images such as videos and still images obtained by the shooting are uploaded to the cloud server 20 and captured (S112). Uploading of the captured image file can be performed, for example, by any of the methods shown in FIGS. 5 to 8 described above.
- the cloud server 20 When the captured image is captured, the cloud server 20 performs editing processing (S113).
- editing processing processes such as selection of a template used in automatic editing, automatic editing and manual editing of captured images (clips), and sound processing are performed. Details of the editing process will be described later with reference to the flowchart of FIG.
- a captured image captured by a device such as the cloud server 20 is also referred to as a clip.
- the final production video is produced by connecting the videos obtained by automatic editing through the editing process, and the production video is distributed and shared (S114).
- video production is performed in the following flow. That is, first, the cloud server 20 creates a project for managing information related to movie production according to a user's operation, and instructs the camera 10 to start capturing captured images.
- the cloud server 20 requests the camera 10 to upload proxy images (PULL request).
- the cloud server 20 captures the proxy image from the camera 10 (S112).
- the cloud server 20 performs editing processing (S113), creates a pre-produced video from the captured proxy image, and distributes it to the terminal device 30 or the like via the network 40 to present it to the user.
- editing processing here, if the captured image is a video, processing such as cutting out the image frame near the shot mark, object recognition, and voice climax recognition is performed, and the pre-production video is produced according to these processes. be done.
- the cloud server 20 makes an upload request (PULL request) for the main images to the camera 10 so that only the main images of the captured images necessary for the pre-production moving image are further captured.
- the cloud server 20 captures the main image from the camera 10 (S112).
- the cloud server 20 performs the editing process again (S113), and creates the final produced video (completed video) from the captured main image.
- the produced moving image produced in this manner is distributed to the terminal device 30 or the like via the network 40 (S114) and presented to the user.
- template selection process S131
- captured image selection process S132
- automatic editing process S133
- manual editing process S134
- sound processing S135
- a template to be used for automatic editing is selected according to user's operation (S131).
- S131 user's operation
- an arbitrary photographed image is selected (automatically or manually selected) from among the captured images (S132).
- AI technology is used to recognize captured images captured in the same scene, and a function is provided to group captured images recognized as being the same. In other words, it provides a selection function when a plurality of shot images are shot for one scene.
- similar captured images can be grouped based on image information and capturing time information respectively obtained from a plurality of captured captured images.
- the shot image taken in the same scene can be used to create a production video. It is possible to select a photographed image to be used for Also, as an aid to manual selection of captured images, captured images grouped by scene may be presented. This makes it easier for the user to select the photographed images that the user actually wants to use in the produced moving image from among the photographed images of the corresponding subject and composition.
- automatic selection and selection assistance of captured images can be performed by performing the following processing. That is, when the captured image is a moving image, it is possible to preferentially extract and select, for example, a moving image clip including the sound "OK" based on the sound recorded during the shooting of the moving image. Shot marks may also be used to assist automatic or manual selection of captured images.
- a viewer is presented so that captured images to which shot marks have been added according to the user's (photographer's) operation at the time of capturing can be identified. good too.
- a shot mark is attached to a photographed image recommended by the user for use in an advertisement.
- the viewer can be displayed on the terminal device 30 .
- the captured image 511-5 is given a shot mark 521-1.
- the captured image 512-4 is given a shot mark 521-2. Shot marks 521-3 and 521-4 are given to the photographed images 513-4 and 513-6 among the photographed images 513-1 to 513-6 grouped in the same group, respectively.
- camera work may be visualized on the viewer using information on the movement of the camera 10 (gyrometa, etc.).
- the camera 10 uses a parameter (face frame parameter) related to a frame superimposed on a human face detected at the time of photographing, and cuts out the face region included in the photographed image, and pans it. It is possible to perform image processing that gives camera work such as zooming.
- the face frame metadata is metadata including the in-focus position and size of the face or the like.
- a face frame 522 is superimposed on the face area included in the captured image 514, and camerawork information 523 indicated by arrows in the figure indicates, for example, information indicating that the camera 10 has zoomed in or out, Information or the like indicating that the hand has been swung to the left or right is displayed.
- the captured image is a video
- information about the position where the sound (voice) is included may be visualized.
- a so-called automatic voice transcription function or the like may be used to display character information based on the speech of the speaker.
- Recognition processing for recognizing an object included in a captured image may be performed to extract and display a captured image including a desired object. For example, by subjecting a captured image to face recognition processing, it is possible to extract a captured image in which a specific person (for example, Mr. A) appears.
- automatic editing is performed using the captured image selected in the captured image selection process (S133).
- processing such as automatic trimming that automatically selects the in-point and out-point of a moving image, and automatic quality correction that corrects for improving the quality of a captured image (clip) is performed.
- automatic quality correction it is possible to remove the influence of camera shake from a captured image by performing camera shake removal processing using information (gyrometa, etc.) related to the movement of the camera 10 .
- processing such as panning and zooming may be performed by recognizing the main subject using the focus meta.
- the captured image may be corrected using metadata added to the captured image at the time of shooting or using AI technology.
- metadata can include information about WB (White Balance) and brightness.
- WB White Balance
- LUT Lookup Table
- a LUT is a table used when converting colors and the like.
- processing may be performed to make the brightness and color of the captured image uniform.
- the photographed images have different brightness and color depending on the subject and light conditions at the time of photographing.
- correction processing is performed to make the brightness and color of the target photographed images uniform.
- an editing process is performed on the captured image selected in the captured image selection process and the production video produced by automatic editing according to the user's operation (S134).
- the user can instruct editing processing for the produced moving image by performing an operation on the UI of the editing screen displayed on the terminal device 30 .
- additional editing such as replacing a video produced by automatic editing with a preferred video or still image or changing the cut-out time is performed as necessary. Note that if the user determines that there is no need to edit the produced moving image, there is no need to perform the manual editing process.
- processing related to sound processing of the produced video is performed (S135).
- wind noise reduction processing can be performed by AI technology, sound signal processing, or the like.
- noise such as wind noise can be removed as the sound of the moving image, and the volume of people's speech can be made uniform.
- wind noise is annoying to the viewers of the video, but in order to prevent the wind noise from being recorded when shooting, it is necessary for the user to take some effort, such as installing a wind jammer accessory. Also, if wind noise is recorded during filming, professional editing, such as using an equalizer, is required to remove it manually. In the sound processing process, noise such as wind noise is automatically removed from the video when editing the captured image, so the user can easily remove the noise without performing any operation.
- the distance between the person and the microphone may vary depending on the shooting location, and even if the person is shooting at the same time, the distance to the microphone may change depending on the position of the person, and the volume of the utterance may change.
- time-consuming editing such as setting the microphone to a different channel for each speaker, saving the sound as a separate audio file, and adjusting the volume individually.
- step S135 ends, the process returns to step S113 of FIG. 9, and the subsequent processes are executed.
- processing for changing the focus position may be performed after photographing.
- processing related to XR (Extended Reality) using depth information obtained by the ranging sensor may be performed.
- Metadata indicating coordinate information on which the camera 10 captures the captured image may be combined with recognition processing of object names and person names in the captured image by the cloud server 20 .
- the name of an object or person focused on during photographing can be converted into character information, and can be displayed as auxiliary data for manual selection of a photographed image.
- the position of the sound in the moving image is the sound in the moving image recorded by the main body of the camera 10 and the separately shot sound separately recorded by a recorder such as an IC recorder or PCM recorder.
- a recorder such as an IC recorder or PCM recorder.
- voice recognition processing may be recognized by voice recognition processing, and the sound (voice) may be synchronized based on the time when the same sentence is uttered.
- WB By learning the WB, exposure, etc. manually adjusted by the creator by machine learning and generating a trained model (for example, DNN (Deep Neural Network)), in subsequent productions, using the trained model, WB and exposure corrections (automatic quality correction) can be performed on captured images. Furthermore, even if multiple people work using the learned model or take over the work, the same WB, the same exposure, etc. can be corrected continuously. In this way, each user can use the trained model that has been trained using the created data of the creator as learning data.
- a trained model for example, DNN (Deep Neural Network)
- the movie production system 1 provides a series of user operations for realizing movie editing as a system.
- video editing requires selection of suitable captured images and a combination of multiple editing operations, which makes it difficult for users to learn editing techniques.
- the video production system 1 for example, by following the steps (a) to (e) below, even a user who has no or little knowledge of video editing can easily edit videos and create a desired production video. I am making it possible.
- FIG. 13 is a block diagram showing a functional configuration example of the processing unit 200 in the moving image production system 1.
- the processing unit 200 is realized by executing a program such as a video production program by a processor such as the CPU 211 or GPU of the cloud server 20 .
- the processing unit 200 may be implemented as a dedicated circuit.
- the processing unit 200 has a captured image acquisition unit 251 , a metadata extraction unit 252 , an operation information acquisition unit 253 , a captured image selection unit 254 and an editing unit 255 .
- the captured image acquisition unit 251 acquires captured images uploaded from the camera 10 or the terminal device 30 via the network 40 and supplies them to the metadata extraction unit 252 .
- the metadata extraction unit 252 extracts the metadata added to the captured image supplied from the captured image acquisition unit 251 and supplies it to the captured image selection unit 254 together with the captured image.
- the metadata extracting section 252 when a photographed image to which no metadata is added is supplied, it is supplied to the photographed image selecting section 254 as it is.
- the operation information acquisition unit 253 acquires operation information related to screen operations such as setting screens and editing screens transmitted from the terminal device 30 via the network 40 and supplies it to the captured image selection unit 254 or the editing unit 255 .
- the captured image selection unit 254 is supplied with metadata and captured images from the metadata extraction unit 252 and operation information from the operation information acquisition unit 253 . Based on the operation information and the metadata, the captured image selection unit 254 selects captured images to be used for production of the produced moving image from captured images, and supplies the selected captured images to the editing unit 255 .
- the operation information includes information indicating the temporal length of the produced video set on the setting screen.
- the metadata includes camera metadata added to the image captured by the camera 10 when the image was captured. More specifically, the metadata includes shot marks attached to the captured image according to user's operation. Although the details will be described later, the captured image selection unit 254 can select captured images to be used for producing the produced moving image based on the temporal length of the produced moving image and shot marks.
- the editing unit 255 uses the selected captured image supplied from the captured image selection unit 254 to perform automatic editing processing including processing such as automatic trimming and automatic quality correction, thereby producing a produced moving image. Although the details will be described later, in the automatic quality correction, correction processing such as brightness correction and color correction can be performed. Further, when the editing information set on the editing screen is supplied as the operation information from the operation information acquiring unit 253, the editing unit 255 can perform automatic editing processing using the editing information. For example, the produced animation is distributed to the terminal device 30 via the network 40 or shared on the network 40 .
- FIG. 14 is a diagram showing a first example of a setting screen used before shooting.
- the setting screen is displayed on the display unit 331 of the terminal device 30 .
- the aspect ratio, the length of time (approximate time), and the number of clips (the number of captured images) are set as the production conditions 611A when the cloud server 20 produces the production video. be able to.
- (frames of) a storyboard relating to the production moving image is generated.
- a setting screen 612 is used to set a template.
- the template can be edited according to the storyboard.
- the setting screen 612 includes a playback area 612A for playing back a sample video, a setting area 612B for setting the music used in the produced video, the brightness and color of the produced video, and a switch button 612C operated when switching templates. , and a save button 612D that is operated when saving the template is displayed.
- the switch button 612C When the switch button 612C is pressed, the screen transitions from the setting screen 612 to the selection screen 613. On the selection screen 613, by selecting a desired template from the existing template group 613A and pressing an OK button 613B, the template to be used can be switched.
- the save button 612D When the save button 612D is pressed, the contents of the template displayed on the setting screen 612 are saved.
- the setting screen 612 also includes a setting area 612E for setting a character insertion scale for each captured image (clip), switching information 612F indicating a switching effect between captured images (between clips), and content when a template is applied.
- a preview button 612G operated to confirm the template and an OK button 612H operated to determine the template are displayed.
- the OK button 612H When the OK button 612H is pressed, the content of the template displayed on the setting screen 612 is set and used when producing a movie. After performing such a setting operation, the user starts shooting with the camera 10, so that the shot image obtained by the shooting is processed in accordance with the template, and a production moving image is produced. In this way, simply by setting the template in advance by the user, the photographed image and the production of the moving image are associated with each other, so that the work of producing the moving image can be facilitated.
- manual editing can be performed as appropriate according to user operations.
- the editing screen shown in FIG. 15 can be used.
- the edit screen is displayed on the display unit 331 of the terminal device 30.
- the first area 615A displays the captured image (clip) selected for the storyboard when setting the template.
- a shot mark is added to the captured image of the target, information indicating the shot mark may be superimposed.
- the captured image (clip) displayed in the first area 615A has already been subjected to correction processing such as camera shake and sound processing.
- the captured images (clips) displayed in time series in the second area 615B have already been subjected to correction processing so that the brightness and colors are uniform.
- a setting screen such as that shown in FIG. 16 may be used to set the estimated time and template for the produced moving image.
- FIG. 16 is a diagram showing a second example of a setting screen used when producing a moving image.
- the setting screen in FIG. 16 will be described with reference to the tables in FIGS. 17 to 19 as appropriate.
- the setting screen 621 includes a title designation section 621A for designating the title of the project, etc., an aspect ratio designation section 621B for designating the aspect ratio of the production video, and an approximate time designation section 621C for designating the time of the production video. including.
- the setting screen 621 also includes a template selection portion 621D for selecting a desired template, a template display portion 621E for displaying the selected template, and a creation button 621F for instructing creation of a project.
- the title of the produced video or project, memos regarding the produced video or project, etc. are input according to the user's operation.
- the aspect ratio of the production video is specified in the aspect ratio specifying section 621B according to the user's operation. For example, as shown in FIG. 17, with 16:9 as the initial value, aspect ratios such as 1:1 and 9:16 can be selected.
- smartphones and tablet terminals are used as devices for watching videos, and videos are displayed and viewed as part of the UI of SNS (Social Networking Service) and websites. things are increasing. Therefore, the user can change the aspect ratio of the created moving image according to the environment in which the user wants to distribute the generated moving image.
- SNS Social Networking Service
- the length of time of the produced moving image is designated in seconds according to the user's operation. For example, as shown in FIG. 18, with 60 seconds as the initial value, it is possible to select a reference time such as 6 seconds, 15 seconds, 30 seconds, or 90 seconds. It does not matter if you do not set a target time.
- Information about one or more templates is displayed in the template selection section 621D, and one template can be selected using radio buttons or the like.
- the user can easily change the atmosphere of the moving image to his/her favorite one by simply selecting one desired template from the templates displayed in the template selection portion 621D.
- one of templates 1 to 8 can be selected with "no template" as the initial value, and the name and other setting information have been registered for each settable template.
- information such as video cut times, color shades, cut switching transitions, background music, subtitle superimposed positions, character sizes, and fonts are registered.
- a template selected in the template selection section 621D is preview-reproduced in the template display section 621E.
- the user is allowed to intuitively recognize the image of the produced moving image (completed moving image) by viewing the completed sample moving image when the specified template is applied to the moving image.
- editable parts are sorted out by, for example, making it impossible to edit parts that give priority to the user's editing work, such as cutting time. be able to.
- the create button 621F is a button for instructing project entry.
- a project for creating a produced moving image according to the settings is registered.
- the close button 621G is pressed, the setting screen 621 is closed and the calling screen is displayed.
- FIG. 20 is a diagram showing a second example of an editing screen used when editing moving images.
- the editing screen 711 includes a first area 711A for accepting user operations, a second area 711B for previewing and playing back a moving image, a third area 711C for making settings related to editing, and a fourth area for making timeline edits and transition settings. 711D.
- the edit screen 711 also includes a fifth area 711E and a sixth area 711F for performing an edit operation on a target captured image, and a seventh area 711G for displaying a list of uploaded captured images.
- the first area 711A is an area in which buttons and the like for accepting user operations for instructing the execution of production, correction, writing, etc. of a produced moving image are arranged. For example, when it is desired to re-align the brightness, color tone, volume difference of speech sounds, etc. by replacing the photographed images used in the production moving image, those functions are executed by pressing the automatic creation button by the user's operation. Note that the function corresponding to the operation may be executed not at the timing when the automatic creation button is pressed, but at the moment when the operation is performed by the user. To output the produced moving image, the export button is pressed.
- the second area 711B is an area where preview reproduction of timeline editing performed in the fourth area 711D is performed.
- the third area 711C is an area for making edit settings for the entire produced moving image. For example, in the third area 711C, it is possible to change the brightness and color of the entire production moving image, or change the BGM. Also, in the third area 711C, the aspect ratio of the produced moving image and the temporal length of the produced moving image (approximate time) can be changed.
- the fourth area 711D is an area for performing cut replacement in timeline editing, transition settings, and the like. For example, even after executing automatic editing by pressing the automatic creation button, the user can use the fourth area 711D to add or delete photographed images to be put in the timeline, change the order, or perform switching. You can change the transition effect.
- the fifth area 711E and the sixth area 711F are areas for performing editing operations on the target captured image.
- the sixth area 711F if the photographed image is a moving image, the start and end time of the time to extract from the one moving image is changed, or if the photographed image is a still image, the one still image is displayed. You can change the length of time to
- the seventh area 711G is an area where a list of shot images uploaded to the project or registered shot images is displayed.
- the user can register captured images such as moving images and still images, and files such as sounds (sounds) in the project.
- Figures 21 and 22 are diagrams showing examples of the file management screen.
- a thumbnail image is displayed for each captured image such as moving images and still images.
- a selection area 722A of the file management screen 722 in FIG. 22 displays a list of captured images such as moving images and still images.
- Such thumbnail display and list display can be switched by operating the switch button 721B or the switch button 722B.
- the user can use the file management screen 721 or the file management screen 722 to select a desired captured image and press an add button 721C or an add button 722C to register it in a desired project.
- FIG. 23 is a diagram showing an example of a project registration screen.
- a captured image when uploaded to the cloud server 20, it can be registered in the project at the same time. At this time, the captured image may be registered in the project using the project registration screen 731 . Alternatively, captured images that have been uploaded in advance may be registered in the project using the project registration screen 731 .
- the captured images registered in the project are displayed in the list of the seventh area 711G on the edit screen 711.
- the seventh area 711G when the captured image registered in the project is a moving image, it corresponds to the time when there is a fixed interval such as the 0th second, the 5th second, and the 10th second in the moving image.
- An image frame is displayed. This allows the user to recognize the overall image of the moving images registered in the project.
- a screen for registering sound files is not exemplified, a UI for selecting a sound file is provided separately from the file management screen for photographed images shown in FIGS. can be registered in the project.
- the user can then create a production video by pressing the automatic creation button in the first area 711A.
- the automatic creation button By the way, when photographing with the camera 10, it is common for the user to repeatedly photograph a certain subject twice or three times, rather than once.
- the feature amount of the captured image is extracted using AI technology, and the time information when the captured image is taken is taken into account.
- the captured images grouped in this manner can be displayed in the seventh area 711G.
- FIG. 24 is a diagram showing a display example of the seventh area 711G when the automatic creation button is pressed.
- each group is expressed as a scene, and after the captured image group (moving image group) of Scene1, the captured image group for each scene such as Scene2, Scene3, . . . is displayed.
- the images taken before the production of the video are not classified, but when the automatic creation button is pressed and the production of the production video is executed, in the basic usage, they are all classified into some scene starting from Scene 1.
- captured images uploaded after production of a produced moving image are first unclassified, and scene classification is performed when moving image production is executed again.
- the captured image is a moving image
- the user can refer to the editing results displayed in the fourth area 711D to, for example, change to a preferred moving image or still image, change the display time or transition, superimpose subtitles or still images, add background music, etc.
- video editing work such as changing , changing brightness and color.
- a timeline what is displayed in the fourth area 711D and manages the overall flow in chronological order.
- a method for determining moving images and still images in the timeline, their display times, transitions, etc. will be described later with reference to the flowchart of FIG. 26 .
- FIG. 25 is a diagram showing a third example of a setting screen used when outputting a moving image.
- the setting screen 811 in FIG. 25 can be used to change the output settings such as aspect ratio and frame rate.
- the setting screen 811 includes an output file name designation section 811A for designating the file name of the produced moving image to be output, an aspect ratio designation section 811B for designating the aspect ratio of the produced moving image, and a frame rate for setting the produced moving image. and a frame rate specifying section 811C.
- the setting screen 811 also includes a format specifying section 811D for setting the format of the produced moving image and a resolution specifying section 811E for setting the resolution of the produced moving image.
- the reproduction operation unit 811G is configured by a seek bar and the like, and can operate the reproduction position of the produced moving image (completed moving image) preview-reproduced on the moving image display unit 811F.
- the cancel button 811H is a button for instructing cancellation of production of the produced video (completed video).
- the output start button 811I is a button for instructing execution of production of a produced moving image (completed moving image).
- step S211 the captured image acquisition unit 251 acquires a captured image uploaded via the network 40 from a device such as the camera 10 or the terminal device 30.
- step S212 the processing unit 200 uses a learned model (for example, DNN) learned by machine learning to extract the feature amount of the captured image.
- a learned model for example, DNN
- a feature vector can be extracted as the feature amount of the captured image.
- feature values are always extracted when a moving image is uploaded as a photographed image, but feature values are extracted arbitrarily when a still image is uploaded. You can change the presence or absence of extraction of The feature amount of the captured image can be held as a feature grouping (feature_grouping) with the same content ID (content_id) as the captured image.
- step S213 the processing unit 200 determines whether or not the automatic creation button in the first area 711A in the editing screen 711 of FIG. judge. If it is determined in step S213 that the automatic creation button has been pressed, the process proceeds to step S214.
- step S214 the processing unit 200 determines whether to perform automatic selection. If it is determined in step S214 that automatic selection is to be performed, the process proceeds to step S215.
- step S215 the photographed image selection unit 254 groups the photographed images based on the extracted feature amount of the photographed images and the photographing time.
- step S216 the captured image selection unit 254 automatically determines the captured image to be used for the timeline displayed in the fourth area 711D of the edit screen 711 based on the group information. In this automatic determination, a shot mark added to the captured image can be used.
- step S216 ends, the process proceeds to step S217. If it is determined in step S214 that automatic selection is not to be performed, steps S215 and S216 are skipped and the process proceeds to step S217.
- step S217 the processing unit 200 determines whether to perform automatic brightness correction. If it is determined in step S217 to perform automatic brightness correction, the process proceeds to step S218.
- step S218, the editing unit 255 uses the first captured image on the timeline displayed in the fourth area 711D of the editing screen 711 as a brightness reference, and adjusts the brightness of the second and subsequent captured images on the timeline.
- brightness is corrected so as to be approximately the same as the brightness of the first captured image.
- the brightness correction method shown here is an example, and other brightness correction methods may be applied.
- step S218 ends, the process proceeds to step S219. If it is determined in step S217 that automatic brightness correction is not to be performed, step S218 is skipped and the process proceeds to step S219.
- step S219 the processing unit 200 displays the processing results of steps S214 to S218 on the editing screen 711. For example, when automatic selection is performed (“Yes” in S214, S215, S216), group information and timeline information are displayed in the fourth area 711D of the edit screen 711 as a processing result. Also, if automatic brightness correction has been performed (“Yes” in S217, S218), the brightness correction result is displayed in the fourth area 711D of the edit screen 711 as the processing result.
- step S219 ends, the series of processes ends.
- Automatic selection is performed in steps S215 and S216 of FIG. 26, but the following processing may be performed, for example. That is, first, when grouping the captured images, if the time of the first cut and the time of the second cut, and the time of the complete package of the produced video are set, the following formulas (1) to (2) ), and equation (3) is derived from equation (2). By requesting grouping according to the number of groups obtained from equation (3), it is possible to group the captured images.
- the time of the complete package can be set as the length of time of the produced moving image on a setting screen such as the setting screen 611 in FIG. 14 or the setting screen 621 in FIG. For example, in the setting screen 621 of FIG. 16, it can be set by the reference time specifying section 621C.
- the captured images are selected from each group. For example, as shown in FIG. 27, one photographed image can be selected from each of groups 1 to 5.
- shot images with shot marks are selected within the same group. For example, if there are one or more moving pictures with shot marks in the group, the moving pictures with the latest dates and times are selected from among the moving pictures. If there is no photographed image with a shot mark attached, a photographed image having a new photographing date and time is selected.
- the cutout time is selected so that the target time, such as 3 seconds or 4 seconds, is centered on the time of the shot mark.
- the target time such as 3 seconds or 4 seconds
- the target cut time is 3 seconds, but if the video time is 2 seconds, the 2-second full length will be used. Accordingly, even if the total target time changes, the user does not mind. Even if it is a 0.1 second video, for the time being, 0.1 second video will be used.
- a video without a shot mark can be cut out centering on the time in the middle of the video.
- a 5-second video can be cut at 2.5 seconds
- an 8-second video can be cut at 4 seconds.
- there is no concept of time so it is possible to decide, for example, to continue displaying for 3 seconds according to the cut.
- the shot images are grouped based on the time length of the production video (time of the complete package), and the shot images used for the production video are selected from among the grouped shot images based on the metadata (shot marks). is selected.
- shot marks are used as metadata, but other parameters (for example, camera parameters) may be used.
- the video cut is 4 seconds long and the transition to connect the cuts is 1 second long
- a in FIG. 3 seconds is the normal display period
- 3 to 4 seconds is also the transition period.
- B of FIG. 28 the number of transition seconds, which is 1 second, is not added to a 4-second moving image cut.
- step S218 of FIG. 26 brightness correction is performed as automatic quality correction, but for example, the following processing may be performed. That is, when automatic selection is requested, the photographed image selected during automatic selection can be made the recommended value acquisition target for brightness correction.
- the captured image is a moving image
- it is necessary to specify which image frame to use but if the image is a moving image with shot marks, the image frame at the latest time among the shot mark times is specified. can be used. Also, when the moving image is not given a shot mark, an image frame in the middle of the time of the moving image (for example, an image frame of 2 seconds for a 4-second moving image) can be used. If the captured image is a still image, it is one image frame, so there is no need to specify the time.
- the captured image is a video when automatic selection is not requested, for example, image frames at an intermediate time between the clipping start time and the clipping end time set on the UI such as the setting screen are displayed. can be used. If the captured image is a still image, there is no need to specify the time.
- step S218 of FIG. 26 brightness correction was exemplified as the correction processing of the captured image, but other correction processing such as hue correction may be performed. Furthermore, processing for reducing noise such as wind noise and processing for equalizing the volume of speech in each moving image may be added. Alternatively, camera shake correction or the like may be added. For example, when camera shake correction is set to ON on a setting screen or the like, camera shake correction is performed for all moving images.
- step S211 of FIG. 26 as a method of uploading a photographed image file, the user performs an operation such as pressing a button or dragging and dropping from the UI of the web browser on the terminal device 30 to upload the photographed image file via the network 40, as described above.
- a method of uploading files to the cloud server 20 can be used.
- a method may be used in which captured images captured by the camera 10 are automatically uploaded to the cloud server 20 via the network 40.
- a list of captured images in the camera 10 may be displayed on the web browser so that the user can select a desired captured image.
- the camera 10 uses proxy recording, that is, the function of simultaneously recording a main image (high-resolution captured image) and a proxy image (low-resolution captured image)
- the proxy image is first sent to the cloud server 20.
- Uploading and automatic editing can be performed, and the main picture can be uploaded to the cloud server 20 by the time a production moving image (completed moving image) is actually created. Thereby, the time required for communication can be reduced.
- users can achieve high-quality video production through automatic correction using camera metadata, even without shooting technology or special equipment.
- the configuration of moving images such as advertisements can be easily created using a template or inserted into a template that utilizes camera metadata.
- a selection function can provide support for clip sorting and scene selection.
- a user can automatically produce a production moving image such as an advertisement by simply photographing a desired subject at the end of photographing.
- the processing performed in the above editing processing is an example, and for example, basic editing functions such as undo/redo functions and speed changes such as slow and speedup may be added.
- Undo means to cancel the contents of the immediately preceding process and return to the state before the process.
- Redo means redoing the process canceled by undo to the original state. Effects such as pan, tilt and zoom may also be added during automatic selection of videos and creation of timelines. Also, a function of automatically converting voice utterances into text may be added.
- the processing unit 200 of the cloud server 20 executes processing such as editing processing, but the processing may be executed by a device other than the cloud server 20.
- the processing unit of the terminal device 30 may have functions corresponding to the processing unit 200 to execute all or part of processing such as editing processing.
- the screens (setting screen, editing screen, etc.) from the cloud server 20 are web pages, which are provided to the terminal device 30 via the network 40, and those screens are displayed as the UI of the web browser.
- the UI on the terminal side is not limited to this.
- dedicated software including so-called native applications
- Programs executed by computers can be provided by being recorded on removable recording media such as package media, for example. Also, the program can be provided via wired or wireless transmission media such as LAN, Internet, and digital satellite broadcasting.
- the program can be installed in the storage unit via the input/output I/F by loading the removable recording medium into the drive. Also, the program can be received by the communication unit and installed in the storage unit via a wired or wireless transmission medium. In addition, the program can be pre-installed in the ROM or storage unit.
- processing performed by the computer according to the program does not necessarily have to be performed in chronological order according to the order described as the flowchart.
- processing performed by a computer according to a program includes processing that is executed in parallel or individually (for example, parallel processing or processing by objects).
- the program may be processed by one computer (processor), or may be processed by a plurality of computers in a distributed manner. Furthermore, the program may be transferred to and executed on a remote computer.
- the term “automatic” means that a device such as the cloud server 20 performs processing without the user’s direct operation
- the term “manual” means that the user directly It means that processing is performed through a similar operation.
- the effects described in this specification are merely examples and are not limited, and other effects may be provided.
- a system means a set of multiple components (devices, modules (parts), etc.), and it does not matter whether all the components are in the same housing. Therefore, a plurality of devices housed in separate enclosures and connected via a network, and a single device housing a plurality of modules within a single enclosure, are both systems.
- the present disclosure can be configured as follows.
- An image processing apparatus comprising a processing unit that creates the moving image using the selected captured image.
- the metadata includes camera metadata added by a camera that captured the captured image.
- the metadata includes a shot mark attached to the captured image according to a user's operation.
- the processing unit is grouping the acquired captured images based on the temporal length of the moving image;
- the image processing device according to (3), wherein a photographed image used for producing the moving image is selected from the grouped photographed images based on the shot mark.
- the image processing apparatus according to any one of (1) to (4), wherein the temporal length of the moving image is set on the setting screen in accordance with a user's operation before producing the moving image.
- the image processing device according to any one of (1) to (4), wherein the temporal length of the moving image is set on the setting screen in accordance with a user's operation before capturing the captured image. .
- the image processing device according to any one of (1) to (6), wherein the temporal length of the moving image can be changed on an edit screen for editing the moving image.
- the setting screen further sets the number of the captured images used in the production of the moving image,
- the processing unit selects the captured image based on the set temporal length of the moving image, the number of captured images, and the metadata.
- the aspect ratio is further set by the setting screen, The image processing device according to any one of (1) to (8), wherein the processing unit creates the moving image according to the set aspect ratio.
- the captured image captured by a camera operated by a user and configured as a server for processing the captured image received via a network The image processing apparatus according to any one of (1) to (10), wherein the produced moving image is transmitted to a terminal device operated by a user via a network.
- (12) The image processing device according to (11), wherein the setting screen is displayed on the terminal device and operated by the user.
- the image processing device Acquire the captured image with metadata attached, Selecting a photographed image to be used for producing the moving image from the obtained photographed images based on the time length of the moving image to be produced set on the setting screen and the metadata, An image processing method for producing the moving image using the selected photographed image.
- the computer Acquire the captured image with metadata attached, Selecting a photographed image to be used for producing the moving image from the obtained photographed images based on the time length of the moving image to be produced set on the setting screen and the metadata, A program that functions as a processing unit that creates the moving image using the selected captured image.
- Video production system 10 camera, 20 cloud server, 30 terminal device, 40-1, 40-2, 40 network, 200 processing unit, 211 CPU, 251 captured image acquisition unit, 252 metadata extraction unit, 253 operation information acquisition part, 254 photographed image selection part, 255 editorial part
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Television Signal Processing For Recording (AREA)
- Studio Devices (AREA)
Abstract
Description
図1は、本開示を適用した動画制作システムの一実施の形態の構成例を示す図である。
図2は、図1のカメラ10の構成例を示すブロック図である。
図3は、図1のクラウドサーバ20の構成例を示すブロック図である。
図4は、図1の端末装置30の構成例を示すブロック図である。
動画制作システム1において、カメラ10により撮影された撮影画像のファイルは、ネットワーク40を介してクラウドサーバ20にアップロードされて処理されるが、例えば、図5に示す方法によりアップロードされる。
図9は、動画制作システム1により提供される動画制作サービスの流れを示したフローチャートである。
(b)動画、静止画、音(音声)、LUTファイルをアップロードする。
(c)編集画面等の画面の自動作成ボタンを押下する。
(d)必要に応じて好みの動画や静止画に入れ替える、切り出し時間を変更するといった手動編集を行う。
(e)入れ替えた動画や静止画に応じて、必要に応じて再度、明るさや色合いの補正を行い、加えて手振れ補正、風音低減、発話音量の均一化などの補正処理も合わせて実行し、制作動画(完成動画)を制作する。
図13は、動画制作システム1における処理部200の機能的な構成例を示すブロック図である。例えば、処理部200は、クラウドサーバ20のCPU211やGPU等のプロセッサによって、動画制作プログラム等のプログラムが実行されることで実現される。あるいは、処理部200を専用の回路として実現してもよい。
図14は、撮影前に用いられる設定画面の第1の例を示す図である。例えば、設定画面は、端末装置30の表示部331に表示される。
図15の編集画面615においては、図14の設定画面611,612により事前に設定されたテンプレートの内容に応じて、設定後に撮影された撮影画像に対して自動編集を施すことで得られた制作動画に関する各種情報が表示される。編集画面615に表示された各種情報は、ユーザの操作に応じて手動編集することができる。
図16は、動画制作時に用いられる設定画面の第2の例を示す図である。図16の設定画面の説明では、図17乃至図19のテーブルを適宜参照しながら説明する。
図20は、動画編集時に用いられる編集画面の第2の例を示す図である。
図25は、動画出力時に用いられる設定画面の第3の例を示す図である。
次に、図26のフローチャートを参照して、撮影画像選択処理と自動編集処理の流れを説明する。
グループ数 - 1 = (完全パッケージの時間 - 1カット目の時間)/2カット目の時間 ・・・(2)
グループ数 = 1 + (完全パッケージの時間 - 1カット目の時間)/2カット目の時間 ・・・(3)
上述した編集処理で行われる処理は一例であり、例えば、アンドゥ/リドゥの機能や、スローや高速化などの速度変更といった基本的な編集機能が追加されてもよい。アンドゥ(undo)は、直前の処理内容を取り消しで、処理する前の状態に戻すことを意味する。リドゥ(redo)は、アンドゥで取り消した処理を逆に元の状態にやり直すことを意味する。また、動画の自動セレクションとタイムラインの作成時に、例えば、パン、チルト、ズームのようなエフェクトが追加されてもよい。また、音声の発話を自動で文字に起こす機能が追加されてもよい。
メタデータが付加された撮影画像を取得し、
設定画面で設定された制作する動画の時間的な長さ、及び前記メタデータに基づいて、取得した前記撮影画像の中から、前記動画の制作に用いる撮影画像を選択し、
選択した前記撮影画像を用いて、前記動画を制作する
処理部を備える
画像処理装置。
(2)
前記メタデータは、前記撮影画像を撮影したカメラで付加されたカメラメタデータを含む
前記(1)に記載の画像処理装置。
(3)
前記メタデータは、ユーザの操作に応じて前記撮影画像に付与されるショットマークを含む
前記(2)に記載の画像処理装置。
(4)
前記処理部は、
前記動画の時間的な長さに基づいて、取得した前記撮影画像をグルーピングし、
前記ショットマークに基づいて、グルーピングされた前記撮影画像の中から、前記動画の制作に用いる撮影画像を選択する
前記(3)に記載の画像処理装置。
(5)
前記動画の時間的な長さは、前記動画の制作の前に、ユーザの操作に応じて、前記設定画面により設定される
前記(1)乃至(4)のいずれかに記載の画像処理装置。
(6)
前記動画の時間的な長さは、前記撮影画像の撮影の前に、ユーザの操作に応じて、前記設定画面により設定される
前記(1)乃至(4)のいずれかに記載の画像処理装置。
(7)
前記動画の時間的な長さは、前記動画を編集する編集画面により変更可能である
前記(1)乃至(6)のいずれかに記載の画像処理装置。
(8)
前記設定画面により前記動画の制作で用いられる前記撮影画像の数がさらに設定され、
前記処理部は、設定された前記動画の時間的な長さ及び前記撮影画像の数、並びに前記メタデータに基づいて、前記撮影画像を選択する
前記(1)乃至(7)のいずれかに記載の画像処理装置。
(9)
前記設定画面によりアスペクト比がさらに設定され、
前記処理部は、設定された前記アスペクト比に応じた前記動画を制作する
前記(1)乃至(8)のいずれかに記載の画像処理装置。
(10)
前記撮影画像は、動画又は静止画である
前記(1)乃至(9)のいずれかに記載の画像処理装置。
(11)
ユーザが操作するカメラにより撮影された前記撮影画像であって、ネットワークを介して受信した前記撮影画像を処理するサーバとして構成され、
制作した前記動画を、ネットワークを介してユーザが操作する端末装置に送信する
前記(1)乃至(10)のいずれかに記載の画像処理装置。
(12)
前記設定画面は、前記端末装置に表示され、前記ユーザにより操作される
前記(11)に記載の画像処理装置。
(13)
画像処理装置が、
メタデータが付加された撮影画像を取得し、
設定画面で設定された制作する動画の時間的な長さ、及び前記メタデータに基づいて、取得した前記撮影画像の中から、前記動画の制作に用いる撮影画像を選択し、
選択した前記撮影画像を用いて、前記動画を制作する
画像処理方法。
(14)
コンピュータを、
メタデータが付加された撮影画像を取得し、
設定画面で設定された制作する動画の時間的な長さ、及び前記メタデータに基づいて、取得した前記撮影画像の中から、前記動画の制作に用いる撮影画像を選択し、
選択した前記撮影画像を用いて、前記動画を制作する
処理部として機能させるプログラム。
Claims (14)
- メタデータが付加された撮影画像を取得し、
設定画面で設定された制作する動画の時間的な長さ、及び前記メタデータに基づいて、取得した前記撮影画像の中から、前記動画の制作に用いる撮影画像を選択し、
選択した前記撮影画像を用いて、前記動画を制作する
処理部を備える
画像処理装置。 - 前記メタデータは、前記撮影画像を撮影したカメラで付加されたカメラメタデータを含む
請求項1に記載の画像処理装置。 - 前記メタデータは、ユーザの操作に応じて前記撮影画像に付与されるショットマークを含む
請求項2に記載の画像処理装置。 - 前記処理部は、
前記動画の時間的な長さに基づいて、取得した前記撮影画像をグルーピングし、
前記ショットマークに基づいて、グルーピングされた前記撮影画像の中から、前記動画の制作に用いる撮影画像を選択する
請求項3に記載の画像処理装置。 - 前記動画の時間的な長さは、前記動画の制作の前に、ユーザの操作に応じて、前記設定画面により設定される
請求項1に記載の画像処理装置。 - 前記動画の時間的な長さは、前記撮影画像の撮影の前に、ユーザの操作に応じて、前記設定画面により設定される
請求項1に記載の画像処理装置。 - 前記動画の時間的な長さは、前記動画を編集する編集画面により変更可能である
請求項1に記載の画像処理装置。 - 前記設定画面により前記動画の制作で用いられる前記撮影画像の数がさらに設定され、
前記処理部は、設定された前記動画の時間的な長さ及び前記撮影画像の数、並びに前記メタデータに基づいて、前記撮影画像を選択する
請求項1に記載の画像処理装置。 - 前記設定画面によりアスペクト比がさらに設定され、
前記処理部は、設定された前記アスペクト比に応じた前記動画を制作する
請求項1に記載の画像処理装置。 - 前記撮影画像は、動画又は静止画である
請求項1に記載の画像処理装置。 - ユーザが操作するカメラにより撮影された前記撮影画像であって、ネットワークを介して受信した前記撮影画像を処理するサーバとして構成され、
制作した前記動画を、ネットワークを介してユーザが操作する端末装置に送信する
請求項1に記載の画像処理装置。 - 前記設定画面は、前記端末装置に表示され、前記ユーザにより操作される
請求項11に記載の画像処理装置。 - 画像処理装置が、
メタデータが付加された撮影画像を取得し、
設定画面で設定された制作する動画の時間的な長さ、及び前記メタデータに基づいて、取得した前記撮影画像の中から、前記動画の制作に用いる撮影画像を選択し、
選択した前記撮影画像を用いて、前記動画を制作する
画像処理方法。 - コンピュータを、
メタデータが付加された撮影画像を取得し、
設定画面で設定された制作する動画の時間的な長さ、及び前記メタデータに基づいて、取得した前記撮影画像の中から、前記動画の制作に用いる撮影画像を選択し、
選択した前記撮影画像を用いて、前記動画を制作する
処理部として機能させるプログラム。
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US18/558,295 US20240233770A1 (en) | 2021-05-12 | 2021-12-01 | Image processing apparatus, image processing method, and program |
| JP2023520754A JPWO2022239281A1 (ja) | 2021-05-12 | 2021-12-01 |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202163187569P | 2021-05-12 | 2021-05-12 | |
| US63/187,569 | 2021-05-12 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2022239281A1 true WO2022239281A1 (ja) | 2022-11-17 |
Family
ID=84028095
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/JP2021/044138 Ceased WO2022239281A1 (ja) | 2021-05-12 | 2021-12-01 | 画像処理装置、画像処理方法、及びプログラム |
Country Status (3)
| Country | Link |
|---|---|
| US (1) | US20240233770A1 (ja) |
| JP (1) | JPWO2022239281A1 (ja) |
| WO (1) | WO2022239281A1 (ja) |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN117857719A (zh) * | 2022-09-30 | 2024-04-09 | 北京字跳网络技术有限公司 | 视频素材剪辑方法及装置 |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2015504629A (ja) * | 2011-11-14 | 2015-02-12 | アップル インコーポレイテッド | マルチメディアクリップの生成 |
| JP2016036078A (ja) * | 2014-08-01 | 2016-03-17 | 株式会社ミクシィ | 情報処理装置、情報処理装置の制御方法及び制御プログラム |
| JP2018514127A (ja) * | 2015-03-24 | 2018-05-31 | フェイスブック,インク. | 選択されたビデオセグメントの再生を提供するためのシステムおよび方法 |
| JP2020182164A (ja) * | 2019-04-26 | 2020-11-05 | キヤノン株式会社 | 撮像装置、その制御方法及びプログラム |
Family Cites Families (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US11956516B2 (en) * | 2015-04-16 | 2024-04-09 | W.S.C. Sports Technologies Ltd. | System and method for creating and distributing multimedia content |
| US10440276B2 (en) * | 2017-11-02 | 2019-10-08 | Adobe Inc. | Generating image previews based on capture information |
| US20210295875A1 (en) * | 2018-08-07 | 2021-09-23 | Justin Garak | Touch panel based video editing |
| JP7491297B2 (ja) * | 2019-02-21 | 2024-05-28 | ソニーグループ株式会社 | 情報処理装置、情報処理方法、プログラム |
-
2021
- 2021-12-01 WO PCT/JP2021/044138 patent/WO2022239281A1/ja not_active Ceased
- 2021-12-01 US US18/558,295 patent/US20240233770A1/en active Pending
- 2021-12-01 JP JP2023520754A patent/JPWO2022239281A1/ja active Pending
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2015504629A (ja) * | 2011-11-14 | 2015-02-12 | アップル インコーポレイテッド | マルチメディアクリップの生成 |
| JP2016036078A (ja) * | 2014-08-01 | 2016-03-17 | 株式会社ミクシィ | 情報処理装置、情報処理装置の制御方法及び制御プログラム |
| JP2018514127A (ja) * | 2015-03-24 | 2018-05-31 | フェイスブック,インク. | 選択されたビデオセグメントの再生を提供するためのシステムおよび方法 |
| JP2020182164A (ja) * | 2019-04-26 | 2020-11-05 | キヤノン株式会社 | 撮像装置、その制御方法及びプログラム |
Also Published As
| Publication number | Publication date |
|---|---|
| JPWO2022239281A1 (ja) | 2022-11-17 |
| US20240233770A1 (en) | 2024-07-11 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US20130083215A1 (en) | Image and/or Video Processing Systems and Methods | |
| US8537230B2 (en) | Imaging apparatus, an imaging method, a reproducing apparatus, a reproducing method and a program | |
| KR101007508B1 (ko) | 비디오 신호 처리 장치, 비디오 신호 기록 장치, 비디오신호 재생 장치, 비디오 신호 처리 장치의 처리 방법,비디오 신호 기록 장치의 처리 방법, 비디오 신호 재생장치의 처리 방법, 기록 매체 | |
| CN103024607B (zh) | 用于显示摘要视频的方法和设备 | |
| US9692963B2 (en) | Method and electronic apparatus for sharing photographing setting values, and sharing system | |
| KR101737086B1 (ko) | 디지털 촬영 장치 및 이의 제어 방법 | |
| KR101739379B1 (ko) | 디지털 촬영 장치 및 이의 제어 방법 | |
| US12212883B2 (en) | Information processing devices, methods, and computer-readable medium for performing information processing to output video content using video from mutiple video sources | |
| KR101909126B1 (ko) | 요약 동영상 디스플레이 방법 및 장치 | |
| JP6043753B2 (ja) | コンテンツ再生システム、サーバ、携帯端末、コンテンツ再生方法、プログラムおよび記録媒体 | |
| KR20150083491A (ko) | 단말과 서버간의 정보 동기화 방법 및 시스템 | |
| US9281014B2 (en) | Image processing apparatus and computer program | |
| WO2013187796A1 (ru) | Способ автоматического монтажа цифровых видеофайлов | |
| WO2022239281A1 (ja) | 画像処理装置、画像処理方法、及びプログラム | |
| RU105102U1 (ru) | Автоматизированная система для создания, обработки и монтажа видеороликов | |
| US10474743B2 (en) | Method for presenting notifications when annotations are received from a remote device | |
| US7675827B2 (en) | Information processing apparatus, information processing method, and program | |
| CN105765969B (zh) | 影像处理方法、装置和设备及影像拍摄系统 | |
| CN104853245A (zh) | 一种电影预览方法和装置 | |
| US20240412481A1 (en) | Image processing apparatus, image processing method, and program | |
| JP2010287974A (ja) | 携帯電話及びプログラム | |
| WO2022176633A1 (ja) | 映像編集装置、映像編集方法、及びコンピュータプログラム | |
| WO2016092647A1 (ja) | 動画編集装置、動画編集方法及びプログラム | |
| JP2018196070A (ja) | 画像処理装置およびプログラム | |
| JP2006060741A (ja) | テレビ端末およびサーバ |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21942009 Country of ref document: EP Kind code of ref document: A1 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2023520754 Country of ref document: JP |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 18558295 Country of ref document: US |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 21942009 Country of ref document: EP Kind code of ref document: A1 |